Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AUC on CV for classification problem #477

Merged
merged 12 commits into from
Jan 31, 2023
Merged

Conversation

oooo26
Copy link
Collaborator

@oooo26 oooo26 commented Jan 29, 2023

  • Use (negative) AUC as CV score for LogisticRegression and MultinomialRegression;
  • Both OvO and OvR Algorithm are implemented for multinomial case;
  • Speed and performance have been tested.

@codecov
Copy link

codecov bot commented Jan 29, 2023

Codecov Report

Base: 96.27% // Head: 97.72% // Increases project coverage by +1.45% 🎉

Coverage data is based on head (c903187) compared to base (994fe73).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #477      +/-   ##
==========================================
+ Coverage   96.27%   97.72%   +1.45%     
==========================================
  Files          27        7      -20     
  Lines        2977      968    -2009     
==========================================
- Hits         2866      946    -1920     
+ Misses        111       22      -89     
Flag Coverage Δ
Python 97.72% <100.00%> (+0.14%) ⬆️
rpackage ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/abess/linear.py 99.11% <ø> (ø)
python/abess/pca.py 100.00% <ø> (ø)
python/abess/utilities.py 91.11% <ø> (ø)
python/abess/bess_base.py 98.87% <100.00%> (+0.05%) ⬆️
python/abess/decomposition.py 94.56% <100.00%> (+0.51%) ⬆️
R-package/R/abesspca.R
R-package/R/utility.R
R-package/R/print.abessrpca.R
R-package/R/deviance.abess.R
R-package/R/initialization.R
... and 15 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@oooo26
Copy link
Collaborator Author

oooo26 commented Jan 29, 2023

Besides, I think ic_type can be renamed as eval_type because this argument is not only for IC now.
And shall we add an argument to specify OvO/OvR for MultinomialRegression?

@Mamba413
Copy link
Collaborator

@oooo26 ,shall we change ic_type to eval_type?

@Mamba413 Mamba413 linked an issue Jan 29, 2023 that may be closed by this pull request
@Mamba413 Mamba413 added the enhancement New feature or request label Jan 29, 2023
@Mamba413
Copy link
Collaborator

@oooo26 ,shall we change ic_type to eval_type?

may be other. you can search scikit-learn for reference

@oooo26
Copy link
Collaborator Author

oooo26 commented Jan 29, 2023

may be other. you can search scikit-learn for reference

Oh Yes. They have a scoring argument in LogisticRegressionCV to control it, but not in the non-CV one. They don't seem to offer different IC type for classification too, only for Lasso.

Shall we add a cv_score or something? (Only scoring might be misunderstood since our regressor combines CV and non-CV problem.)

@oooo26
Copy link
Collaborator Author

oooo26 commented Jan 29, 2023

And I think ic_type/cv_score can share the same API in Cpp (that's what I have done so far), since only one of the criteria will be used.

@Mamba413
Copy link
Collaborator

two excellent comments. just do it.

@Mamba413
Copy link
Collaborator

Mamba413 commented Jan 29, 2023

@EQUIWDH you can inspect cpp code and program for cindex similarly.

@oooo26
Copy link
Collaborator Author

oooo26 commented Jan 29, 2023

two excellent comments. just do it.

OK! I will update soon.

And @EQUIWDH, if you would like to add other loss like c-index for CV, please see the test_loss() function in Metric.h. (Other files changed are just trivia.)

@oooo26
Copy link
Collaborator Author

oooo26 commented Jan 30, 2023

Add cv_score:

  • cv_score='test_loss': default;
  • cv_score='roc_auc': additional for logistic;
  • cv_score='roc_auc_ovo' or 'roc_auc_ovr': additional for multinomial;

Since AUC score seems to perform better on Logistic (both speed and accuracy), maybe we can set LogisticRegression(cv_score='roc_auc') as default?

@oooo26
Copy link
Collaborator Author

oooo26 commented Jan 30, 2023

Oh there is a small conflict on R: when using CV, the ic_type should be set as 0 so that it can use test loss as before.

But currently, it is 1. We may need to change it in R-package/R/utility.R. (Otherwise, there will be some annoying warnings...)

@Mamba413
Copy link
Collaborator

I skim the R code in R-package now, and I think @bbayukari can quickly address this remaining warning in abesspca (see https://github.com/abess-team/abess/actions/runs/4044489654/jobs/6954695218).

@oooo26
Copy link
Collaborator Author

oooo26 commented Jan 31, 2023

I skim the R code in R-package now, and I think @bbayukari can quickly address this remaining warning in abesspca (see https://github.com/abess-team/abess/actions/runs/4044489654/jobs/6954695218).

Yep, just setting ic_type=0 under CV will fix it. Because ic_type=1 will represent AUC score for logistic now.

@Mamba413
Copy link
Collaborator

I skim the R code in R-package now, and I think @bbayukari can quickly address this remaining warning in abesspca (see https://github.com/abess-team/abess/actions/runs/4044489654/jobs/6954695218).

Yep, just setting ic_type=0 under CV will fix it. Because ic_type=1 will represent AUC score for logistic now.

Seems everything is fine now! You can merge it if you also believe this is fine.

@oooo26 oooo26 merged commit 7d2bd24 into abess-team:master Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

Cox model: c-index
2 participants