Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Compatibility with Numpy Arrays and Scipy Matricies for features #16

Open
uwaisiqbal opened this issue Jun 28, 2017 · 3 comments
Open

Comments

@uwaisiqbal
Copy link

At the moment the library only accepts a list of feature dictionaries which for our purposes can consume an enormous amount of memory even when using generators. Would it be possible to extend the API to accept numpy arrays or scipy sparse matricies generated from the sklearn DictVectorizer?

@kmike
Copy link
Contributor

kmike commented Jun 28, 2017

@Oasis789 crfsuite implements vectorization itself, that's why dicts are currently exposed. I wonder why do you prefer DictVectorizer - sklearn-crfsuite data format is largely compatible, with a few extra features usable for sequential models.

It could be possible to implement what you're suggesting usin crfsuite C API (https://github.com/jakevdp/pyCRFsuite did that), but it requires wor.

See also: scrapinghub/python-crfsuite#38

@uwaisiqbal
Copy link
Author

I wanted to put together a pipeline for feature generation that would include the crf model making use of sklearn feature unions. The feature unions concatenate the output of transformations in the form of spares matrices. I wanted to be able to feed this directly to the crf model within the pipeline.

@albertoandreottiATgmail
Copy link

albertoandreottiATgmail commented Jul 11, 2018

hi @kmike are floats used as features in dictionaries taken as they are or do they suffer any transformation? I'm asking because I'm concerned with data sparcity, for example if I encode my feature in a [-1, 1] range I wouldn't like the vectorizer to create a single feature for each single possible value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants