Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling text features #127

Open
bjschoenfeld opened this issue Aug 9, 2018 · 0 comments
Open

Handling text features #127

bjschoenfeld opened this issue Aug 9, 2018 · 0 comments

Comments

@bjschoenfeld
Copy link
Member

bjschoenfeld commented Aug 9, 2018

Text columns are currently treated as categorical columns. When these columns get one-hot encoded, computing metafeatures becomes intractable (e.g. PCA and knn metafeatures). There are several easy solutions: disallow text columns by throwing an error, ignore text columns, or don't compute expensive metafeatures which depend of categorical columns.

What metafeatures exist for datasets which contain text? Do we implement those here?

@bjschoenfeld bjschoenfeld changed the title Text features cause intractable compute times Handling text features Aug 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant