Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify logic for final model training #273

Merged
merged 5 commits into from
Sep 26, 2024
Merged

Conversation

fraimondo
Copy link
Contributor

So far, when the user requested the final model, after calling scikit-learn's cross_validate, julearn was fiting the model again, on the full training data.

The main issue is when using joblib to parallelize, there was a call for each outer CV fold and once it was done, the main process will fit the final model. With enough resources, this is suboptimal, as one might want to fit the final model at the same time of the individual folds.

This PR changes the internal logic so the effect is the same, but the fiting happens at a different time. The idea is to add an "extra" fold in the CV object which includes the whole dataset. After the call to cross_validate is done, we remove the last entry and use this as the final model, obtaining the same output, but allowing the user to use joblib to parallelise together across CV folds and the final model.

Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 95.34884% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.89%. Comparing base (eb7207f) to head (cfb4936).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
julearn/utils/_cv.py 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #273      +/-   ##
==========================================
+ Coverage   89.83%   89.89%   +0.05%     
==========================================
  Files          54       56       +2     
  Lines        2449     2483      +34     
  Branches      497      504       +7     
==========================================
+ Hits         2200     2232      +32     
- Misses        163      164       +1     
- Partials       86       87       +1     
Flag Coverage Δ
docs 100.00% <ø> (ø)
julearn 89.88% <95.34%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
julearn/api.py 92.89% <100.00%> (+0.26%) ⬆️
julearn/model_selection/final_model_cv.py 100.00% <100.00%> (ø)
julearn/model_selection/utils.py 100.00% <100.00%> (ø)
julearn/utils/_cv.py 90.24% <60.00%> (-4.21%) ⬇️

@fraimondo fraimondo marked this pull request as ready for review September 26, 2024 12:51
julearn/model_selection/utils.py Outdated Show resolved Hide resolved
julearn/model_selection/utils.py Show resolved Hide resolved
julearn/model_selection/final_model_cv.py Show resolved Hide resolved
Copy link

github-actions bot commented Sep 26, 2024

PR Preview Action v1.4.8
Preview removed because the pull request was closed.
2024-09-26 13:51 UTC

@fraimondo fraimondo merged commit eac0462 into main Sep 26, 2024
20 checks passed
@fraimondo fraimondo deleted the enh/merge_cv_final_model branch September 26, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants