Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Out-Of-Bag Error should not require storing the model. #1158

Open
sebffischer opened this issue Sep 3, 2024 · 7 comments
Open

Comments

@sebffischer
Copy link
Member

It is currently necessary to store the $model of a learner in order to access its oob_error.
An example for this is the random forest:

library(mlr3verse)
rr = resample(tsk("iris"), rsmp("classif.ranger"), rsmp("holdout"), store_models = TRUE)
rr$aggregate(msr("oob_error"))

The code below will err if we don't specify store_models = TRUE because the out-of-bag error active binding extracts this information from learner$model, which will fail of the model is not stored.

Instead of making the out-of-bag error active binding access the learner's $model, we can instead support the private function $.extract_oob_error() which will add the out-of-bag error score to the learner's $state, which is even accessible when store_models = FALSE.
This is already implemented for internal validation scores and internal tuned values:

mlr3/R/worker.R

Lines 94 to 100 in 5ffcfee

if (!is.null(validate)) {
learner$state$internal_valid_scores = get_private(learner)$.extract_internal_valid_scores()
learner$state$internal_valid_task_hash = task$internal_valid_task$hash
}
if (exists(".extract_internal_tuned_values", get_private(learner))) {
learner$state$internal_tuned_values = get_private(learner)$.extract_internal_tuned_values()

@be-marc
Copy link
Member

be-marc commented Dec 19, 2024

We could solve this with callbacks.

library(mlr3learners)

task = tsk("pima")
learner = lrn("classif.ranger")
resampling = rsmp("cv", folds = 3)

callback = clbk("mlr3.score_measures", measures = msr("oob_error"))

rr = resample(task, learner, resampling = resampling, store_models = FALSE, callbacks = callback)
rr$data_extra

# [[1]]
# [[1]]$score_measures
# oob_error 
# 0.2597656 


# [[2]]
# [[2]]$score_measures
# oob_error 
# 0.2402344 


# [[3]]
# [[3]]$score_measures
# oob_error 
# 0.2480469 

# aggregate
mean(mlr3misc::map_dbl(rr$data_extra, function(x) {
  x$score_measures["oob_error"]
}))

@sebffischer
Copy link
Member Author

this does not allow for tuning i assume?

@be-marc
Copy link
Member

be-marc commented Dec 19, 2024

Yes, it would be cool if $score() and $aggregate() could recognize the return of the callback in $data_extra as performance scores. You could solve this with tuning callbacks but I would prefer a more elegant solution.

@be-marc
Copy link
Member

be-marc commented Dec 20, 2024

@berndbischl decided that the output of a callback is an end point. So $score() and $aggregate() cannot work with it. In the tuning case, I would use a CallbackTuning that saves the models of a tuning iteration temporary, call msr("oob_error") on them and then throw the models away. Or you combine a CallbackResample with a CallbackTuning. The CallbackResample extracts the oob error on the worker without storing the models. The CallbackTuning passes the oob error saved in data_extra to the archive. Close?

@sebffischer
Copy link
Member Author

but we use the oob error tuning example in the book so we need to fix this

@be-marc
Copy link
Member

be-marc commented Dec 20, 2024

in "Predict Sets, Validation and Internal Tuning"?

@sebffischer
Copy link
Member Author

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants