-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore optimization of machine learning pipeline #7
Comments
The pipeline used to be much faster when we (incorrectly) did feature selection and standardization prior to cross validation (grid search). The issue is that sklearn reperforms these tranformations verbatim when they could be memoized: see scikit-learn/scikit-learn#7536 (comment). There are two potential solutions:
|
Pipeline memoising has been implemented in:
My more generic model memoiser requires scikit-learn/scikit-learn#5080, which I may push for again at some point :) |
I think there's still potential for something like scikit-learn/scikit-learn#3951 to be merged. |
The current classifier pipeline takes a long time to fit and may be fitting the same model multiple times. This should be looked at in hopes of finding some low-hanging fruit in performance gains.
The text was updated successfully, but these errors were encountered: