You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment the library only accepts a list of feature dictionaries which for our purposes can consume an enormous amount of memory even when using generators. Would it be possible to extend the API to accept numpy arrays or scipy sparse matricies generated from the sklearn DictVectorizer?
The text was updated successfully, but these errors were encountered:
@Oasis789 crfsuite implements vectorization itself, that's why dicts are currently exposed. I wonder why do you prefer DictVectorizer - sklearn-crfsuite data format is largely compatible, with a few extra features usable for sequential models.
I wanted to put together a pipeline for feature generation that would include the crf model making use of sklearn feature unions. The feature unions concatenate the output of transformations in the form of spares matrices. I wanted to be able to feed this directly to the crf model within the pipeline.
hi @kmike are floats used as features in dictionaries taken as they are or do they suffer any transformation? I'm asking because I'm concerned with data sparcity, for example if I encode my feature in a [-1, 1] range I wouldn't like the vectorizer to create a single feature for each single possible value.
At the moment the library only accepts a list of feature dictionaries which for our purposes can consume an enormous amount of memory even when using generators. Would it be possible to extend the API to accept numpy arrays or scipy sparse matricies generated from the sklearn DictVectorizer?
The text was updated successfully, but these errors were encountered: