Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode #185

Open
sbadithe opened this issue Jul 26, 2022 · 1 comment
Open

Support Unicode #185

sbadithe opened this issue Jul 26, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@sbadithe
Copy link
Contributor

As a user, I wish NLP Primitives had the ability to handle unicode text.

Currently, Unicode text is not correctly handled by regexes in nlp_primitives.

For example, Àbc is not recognized as a title word by TitleWordCount (Abc is).

@gsheni
Copy link
Contributor

gsheni commented Jul 26, 2022

@sbadithe Is it possible to make a pytest fixture and have it be used by all the NL primitives? That way if we add more NL primitives in the future, we can make sure they support unicode.

@sbadithe sbadithe added the enhancement New feature or request label Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants