-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling too few classes in landmarker cross validation #170
Comments
Running on LL0_488_colleges_aaup dataset
|
Can we compare with OpenML on this? |
Similar to this, datasets with fewer than 4 instances per class fail. Should we handle something like this? import pandas as pd Traceback (most recent call last): |
I believe you, but why is it 4, not 2? We only do 2-fold cv. |
I think it's because with 2-fold cv the training set has half as many instances, so it needs at least 4 |
I would think that if there were only two instances and two folds, one instance would go to each fold. The folds would take turns being the train and test sets... |
Our landmarkers perform cross validation with 2 folds. Some datasets may have only 1 instance of a particular target class. In this case, the validation in sklearn's cross validation throws an error, requiring at least n_folds (2 in our case) instances of each class. This is not pretty to have such an error thrown. How should we handle this?
The text was updated successfully, but these errors were encountered: