Explore optimization of machine learning pipeline #7

jessept · 2016-11-02T00:36:02Z

The current classifier pipeline takes a long time to fit and may be fitting the same model multiple times. This should be looked at in hopes of finding some low-hanging fruit in performance gains.

dhimmel · 2016-11-02T14:11:32Z

The pipeline used to be much faster when we (incorrectly) did feature selection and standardization prior to cross validation (grid search). The issue is that sklearn reperforms these tranformations verbatim when they could be memoized: see scikit-learn/scikit-learn#7536 (comment).

There are two potential solutions:

Use grid search from dask-learn (dklearn). dask-learn uses dask in the background. I'm excited about dask, but dask-learn development appears to have stalled and the pull request to include dask-learn in dask petered out. However, it may still be functional.
@jnothman may know of a solution based on Finding which features are passed to the final estimator of an sklearn pipeline scikit-learn/scikit-learn#7536 (comment) where he said:

we've seen a couple of attempted contributions, as well as my generic remember_model wrapper which I've never formally submitted as a PR

jnothman · 2016-11-07T10:56:55Z

Pipeline memoising has been implemented in:

My more generic model memoiser requires scikit-learn/scikit-learn#5080, which I may push for again at some point :)

jnothman · 2016-11-07T10:58:32Z

I think there's still potential for something like scikit-learn/scikit-learn#3951 to be merged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore optimization of machine learning pipeline #7

Explore optimization of machine learning pipeline #7

jessept commented Nov 2, 2016

dhimmel commented Nov 2, 2016

jnothman commented Nov 7, 2016

jnothman commented Nov 7, 2016 •

edited by dhimmel

Loading

Explore optimization of machine learning pipeline #7

Explore optimization of machine learning pipeline #7

Comments

jessept commented Nov 2, 2016

dhimmel commented Nov 2, 2016

jnothman commented Nov 7, 2016

jnothman commented Nov 7, 2016 • edited by dhimmel Loading

jnothman commented Nov 7, 2016 •

edited by dhimmel

Loading