nlputils

Utilites for processing various types of data for machine learning models

Functionality

It helps build and manager vocabularies from corpi. It includes the following functionality:

tokenise (Useful regex: r"([\w]+(?:(?!\s)\W?[\w]+)*)" )
stem or unstem
filters: ability to define filters that will accept or reject vocabulary entries (e.g. stopwords)
token-level cleanups
merging of multiple vocabularies
Replace character(s) in all token
Save vocabulary
Load vocabulary

Utility to build and manage frequency matrices from corpi with the following functionality:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
nlp		nlp
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py