-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add count_sentences function to nlp.py #51
Conversation
Also add tests for the function to test_nlp.py
It looks super nice! (and cool!) This is definitely useful (Texthero's users will like it) Some small changes / comments:
|
Thanks, I've implemented the suggestions. I opened the issue mentioned in 3 at #54 . About 5, I thought about it, and python already supports unicode in normal strings, so we actually do not need this. It rather is dangerous (it's also used in some other functions like noun_chunks) as e.g. we have
so it converts the np.nan object that's used internally for missing values to the literal string "nan". This is certainly not what the user wants/expects. Using pd.api.types.is_string_dtype(s) will not work, as it shows the following behaviour:
Something like this
should work and is very fast. So in total,
|
Add more tests, change style (docstring, tests naming). Remove unicode-casting to avoid unexpected behaviour.
Very good job and a very nice catch! By the way, what's your name? I suggest to add a photo profile on your Github profile and also have a more readable name ... it might be good pub for you!
How Texthero's should throw errors I opened an issue regarding that, please have a look at #60. We can either deal with that afterward this PR and merge it or again put it in |
I couldn't yet get the website to work locally with the updated documentation (running the website works, but the script update_documentation.sh gives me errors - maybe that's not the correct way to update the documentation or I need to do something else first?), so I added a direct link to the spacy sentencizer documentation that will render as hyperlink in sphinx (should have the same effect).
I'd suggest not putting this PR on hold as #60 could take quite some time until everything surrounding that is finished. When implementing #60 and related issues like #55 most functions will need to be overhauled anyway.
I've changed it to "cell", that should be intuitive and I saw it's already used in some other locations in texthero. Also, I implemented this 🤓 :
|
Additionally update index tests, they're cleaner now.
Thank you! All right, we will change it later on all functions. Regarding spaCy, we should do a comparison study and see if effectively using You can now add another PR where you update the README by adding your name under contributors! 🎉🎉 congrats and welcome! |
* Add count_sentences function to nlp.py Also add tests for the function to test_nlp.py * Implement suggestions from pull request. Add more tests, change style (docstring, tests naming). Remove unicode-casting to avoid unexpected behaviour. * Add link to spacy documentation. Additionally update index tests, they're cleaner now. Co-authored-by: Henri Froese <[email protected]>
Also add tests for the function to test_nlp.py
The function is implemented with spaCy, similarly to the other functions in the nlp module.
In response to #46