Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining features from bulk-load and import-file #180

Open
AADeLucia opened this issue Jun 10, 2020 · 1 comment
Open

Combining features from bulk-load and import-file #180

AADeLucia opened this issue Jun 10, 2020 · 1 comment

Comments

@AADeLucia
Copy link

There are features that are available in bulk-load that are not in import-file and vice versa:

  • bulk-load allows pruning (very handy)
  • import-file allows custom regex patterns
  • import-file allows extra stopwords

I find these features very handy. Are there any plans to combine some of the features?

@mimno
Copy link
Owner

mimno commented Jun 10, 2020

Good question. Adding a vocabulary builder step that doesn't write instance files might make pruning easier for very large data sets. Not allowing regexes is a big part of what made bulk-loader fast, but this may have changed. For stopwords you can always start with the default English list and add to that for bulk-load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants