Use Feature Selection with Successive Halving and progressive_val_score #1202
Unanswered
IndeedPete
asked this question in
Q&A
Replies: 1 comment 4 replies
-
Hi @IndeedPete. Without delving more into your problem I cannot say for sure what's happening. But the error might be related to random forest. The forest samples subsets of the features to build the trees. As the number of features decrease, this might be affecting the execution. On the other hand, the root of the error could also come from progressive validation. (PS: opening an issue is the preferred way to report this kind of problem). Could your share some data, to facilitate reproducing your problem? |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm a bit confised here as to how the selectors from feature_selection are supposed to be combined with the progressive_val_score function and successive halving. I cannot manually iterate over the data before passing it to progressive_val_score, somehow transforming it and then putting it back into a numpy array or pandas dataset seems hacky as well. I tried adding the selector to the model pipelines under evaluation but that gives me an error, either right at the start or after a few halvings (see last code block at the bottom). I'm pasting my code below so you can get an idea what I'm trying to do.
I have an encoded pandas datset with 61,000 rows and 62 columns. I train my selector to keep around 25% of the features:
Then I'm creating some model configurations, selector included in the pipelines:
And here comes the model selector and evaluation:
This gives me an error after a while:
However, when I remove the selector from the pipelines, it works as intended. Am I doing something wrong? Could it be a bug? What would be a better approach?
Thank you for your input!
Beta Was this translation helpful? Give feedback.
All reactions