Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please restore the "weights" property for the "classif.log_reg" learner! #265

Closed
FlorianPargent opened this issue Mar 7, 2023 · 1 comment · Fixed by #305
Closed

Please restore the "weights" property for the "classif.log_reg" learner! #265

FlorianPargent opened this issue Mar 7, 2023 · 1 comment · Fixed by #305

Comments

@FlorianPargent
Copy link

FlorianPargent commented Mar 7, 2023

In version 0.5.4 of mlr3learners you removed the "weights" property of the "classif.log_reg" learner based on the reasoning in this thread mlr-org/mlr3pipelines#177.
I would strongly argue that the reasoning in that thread is false and that the "weights" argument in the glm function works exactly as expected and should be restored for the "classif.log_reg" learner in mlr3learners as fast as possible!

One way to see that the glm weights work as expected is this blog article.
As you can see in the blog, estimating a logistic regression model with unaggregated data (the usual case) gives exactly the same model (see the coefficient estimates and p-values) as estimating the model with aggregated data and using the number of observations per condition as weights.
Note that in glm there is an important difference between "prior weights" (the weight argument in glm) and working weights (the weights value of a fitted glm object), as explained in the documentation of glm function. In case of mlr3learners, we are only concerned with the prior weights, which work exactly as expected.

Another example which shows why restoring the weights functionality is important is my own preprint on using cost-sensitive learning with mlr3.
The benchmark in Table 3 was computed with an older version of mlr3learners when the weights property in classif.log was still available. We can clearly see that logistic regression with theoretical weighting (TW in Table 3: weights are theoretically chosen based on the cost matrix) gives about the same predictive performance as with empirical weighting (EW in Table 3: weights are empirically tuned). The similarity in predictive performance again demonstrates that the weights work as expected: I computed the theoretical weights in the usual way and empirical weighting leads to almost the same weights.

As demonstrated by our article (which is now accepted by a journal), the weights property for classif.log_reg is extremely valueable and should be restored immediately! We noticed the removal of the weights argument because we are currently preparing our manuscript for publication. Currently we are forced to use an older version of mlr3learners, which is extremely inconvenient for any readers of our tutorial-style paper. We would greatly appreciate if you could change this as soon as possible. If anybody would run our tutorial code with the current version of mlr3learners and mlr3pipelines, they would get highly misleading results! See also our related issue in the repository of mlr3pipelines: mlr-org/mlr3#931.

@ck37
Copy link

ck37 commented Apr 8, 2024

It seems that #177 is the correct link to where this was discussed and changed, referencing https://stats.stackexchange.com/a/386913

I agree with the OP - observation weights are very commonly used with logistic regression, including when downsampling a large EHR dataset for ML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants