Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Matching content in our doctests #197

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ Texthero is there for the NLP-community. If you have an idea on how we can impro
1. Before writing a new function or make any changes, look at similar code for inspiration and to learn about the code format and style.
1. The maximal docstring line length should be 75 characters. This should be manually done as `black` formatting does not enforce limits on docstring line length.
1. Use American English instead of British English (e.g. categorize instead of categorise) when writing comments and documenting docstrings.
1. Use as most as possible quotes or sentences that you can find from superheroes comics or movies, like "HULK SMASH!", "I am Groot!", "I am the vengeance, I am the night, I am BATMAN!", "With great power comes great responsibility.", etc...
1. For default argument values, use the defaults from the underlying library if applicable (e.g. the default arguments
from sklearn if using a sklearn algorithm). If other values are used, add a small comment explaining why. Additionally, look for similar functions and use their default values.
1. Default values are defined as follows: `x : int, optional, default=2`
Expand Down
29 changes: 15 additions & 14 deletions texthero/nlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,11 @@ def named_entities(s: TextSeries, package="spacy") -> pd.Series:
--------
>>> import texthero as hero
>>> import pandas as pd
>>> s = pd.Series("Yesterday I was in NY with Bill de Blasio")
>>> s = pd.Series("Yesterday, Spider-Man met Daredevil in Queens, New-York.")
>>> hero.named_entities(s)[0] # doctest: +NORMALIZE_WHITESPACE
[('Yesterday', 'DATE', 0, 9), ('NY', 'GPE', 19, 21),
('Bill de Blasio', 'PERSON', 27, 41)]
[('Yesterday', 'DATE', 0, 9), ('Spider-Man', 'PERSON', 11, 21),
('Daredevil', 'GPE', 26, 35), ('Queens', 'GPE', 39, 45),
('New-York', 'GPE', 47, 55)]
"""
entities = []

Expand Down Expand Up @@ -93,9 +94,9 @@ def noun_chunks(s: TextSeries) -> pd.Series:
--------
>>> import texthero as hero
>>> import pandas as pd
>>> s = pd.Series("The spotted puppy is sleeping.")
>>> s = pd.Series("A little spider just bite me!")
>>> hero.noun_chunks(s)
0 [(The spotted puppy, NP, 0, 17)]
0 [(A little spider, NP, 0, 15), (me, NP, 26, 28)]
dtype: object
"""

Expand Down Expand Up @@ -130,8 +131,8 @@ def count_sentences(s: TextSeries) -> pd.Series:
>>> import texthero as hero
>>> import pandas as pd
>>> s = pd.Series(
... ["Yesterday I was in NY with Bill de Blasio. Great story...",
... "This is the F.B.I.! What? Open up!"])
... ["Yesterday, Spider-Man met Daredevil in Queens, New-York. Great story...",
... "This is the S.H.I.E.L.D! What? Open up!"])
>>> hero.count_sentences(s)
0 2
1 3
Expand Down Expand Up @@ -166,7 +167,7 @@ def pos_tag(s: TextSeries) -> pd.Series:
coarse-grained POS has a NOUN value, then the refined POS will give more
details about the type of the noun, whether it is singular, plural and/or
proper.

You can use the spacy `explain` function to find out which fine-grained
POS it is.

Expand Down Expand Up @@ -204,11 +205,11 @@ def pos_tag(s: TextSeries) -> pd.Series:
--------
>>> import texthero as hero
>>> import pandas as pd
>>> s = pd.Series("Today is such a beautiful day")
>>> s = pd.Series("Today is such a marvelous day")
>>> print(hero.pos_tag(s)[0]) # doctest: +NORMALIZE_WHITESPACE
[('Today', 'NOUN', 'NN', 0, 5), ('is', 'AUX', 'VBZ', 6, 8), ('such', 'DET',
'PDT', 9, 13), ('a', 'DET', 'DT', 14, 15), ('beautiful', 'ADJ', 'JJ', 16,
25), ('day', 'NOUN', 'NN', 26, 29)]
[('Today', 'NOUN', 'NN', 0, 5), ('is', 'AUX', 'VBZ', 6, 8),
('such', 'DET', 'PDT', 9, 13), ('a', 'DET', 'DT', 14, 15),
('marvelous', 'ADJ', 'JJ', 16, 25), ('day', 'NOUN', 'NN', 26, 29)]
"""

pos_tags = []
Expand Down Expand Up @@ -264,9 +265,9 @@ def stem(s: TextSeries, stem="snowball", language="english") -> TextSeries:
--------
>>> import texthero as hero
>>> import pandas as pd
>>> s = pd.Series("I used to go \t\n running.")
>>> s = pd.Series("I used to go \t\n flying.")
>>> hero.stem(s)
0 i use to go running.
0 i use to go flying.
dtype: object
"""

Expand Down
Loading