Feature Request: Out-Of-Bag Error should not require storing the model. #1158

sebffischer · 2024-09-03T08:35:25Z

It is currently necessary to store the $model of a learner in order to access its oob_error.
An example for this is the random forest:

library(mlr3verse)
rr = resample(tsk("iris"), rsmp("classif.ranger"), rsmp("holdout"), store_models = TRUE)
rr$aggregate(msr("oob_error"))

The code below will err if we don't specify store_models = TRUE because the out-of-bag error active binding extracts this information from learner$model, which will fail of the model is not stored.

Instead of making the out-of-bag error active binding access the learner's $model, we can instead support the private function $.extract_oob_error() which will add the out-of-bag error score to the learner's $state, which is even accessible when store_models = FALSE.
This is already implemented for internal validation scores and internal tuned values:

mlr3/R/worker.R

Lines 94 to 100 in 5ffcfee

    
           if (!is.null(validate)) { 
        
             learner$state$internal_valid_scores = get_private(learner)$.extract_internal_valid_scores() 
        
             learner$state$internal_valid_task_hash = task$internal_valid_task$hash 
        
           } 
        
           if (exists(".extract_internal_tuned_values", get_private(learner))) { 
        
             learner$state$internal_tuned_values = get_private(learner)$.extract_internal_tuned_values()

The text was updated successfully, but these errors were encountered:

be-marc · 2024-12-19T13:02:18Z

We could solve this with callbacks.

library(mlr3learners)

task = tsk("pima")
learner = lrn("classif.ranger")
resampling = rsmp("cv", folds = 3)

callback = clbk("mlr3.score_measures", measures = msr("oob_error"))

rr = resample(task, learner, resampling = resampling, store_models = FALSE, callbacks = callback)
rr$data_extra

# [[1]]
# [[1]]$score_measures
# oob_error 
# 0.2597656 


# [[2]]
# [[2]]$score_measures
# oob_error 
# 0.2402344 


# [[3]]
# [[3]]$score_measures
# oob_error 
# 0.2480469 

# aggregate
mean(mlr3misc::map_dbl(rr$data_extra, function(x) {
  x$score_measures["oob_error"]
}))

sebffischer · 2024-12-19T13:29:53Z

this does not allow for tuning i assume?

be-marc · 2024-12-19T14:14:33Z

Yes, it would be cool if $score() and $aggregate() could recognize the return of the callback in $data_extra as performance scores. You could solve this with tuning callbacks but I would prefer a more elegant solution.

be-marc · 2024-12-20T14:47:40Z

@berndbischl decided that the output of a callback is an end point. So $score() and $aggregate() cannot work with it. In the tuning case, I would use a CallbackTuning that saves the models of a tuning iteration temporary, call msr("oob_error") on them and then throw the models away. Or you combine a CallbackResample with a CallbackTuning. The CallbackResample extracts the oob error on the worker without storing the models. The CallbackTuning passes the oob error saved in data_extra to the archive. Close?

sebffischer · 2024-12-20T15:13:15Z

but we use the oob error tuning example in the book so we need to fix this

be-marc · 2024-12-20T15:48:16Z

in "Predict Sets, Validation and Internal Tuning"?

sebffischer · 2024-12-20T21:58:49Z

yes

be-marc added the Type: Enhancement label Oct 25, 2024

be-marc added the Workshop label Dec 5, 2024

berndbischl assigned be-marc Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Out-Of-Bag Error should not require storing the model. #1158

Feature Request: Out-Of-Bag Error should not require storing the model. #1158

sebffischer commented Sep 3, 2024

be-marc commented Dec 19, 2024 •

edited

Loading

sebffischer commented Dec 19, 2024

be-marc commented Dec 19, 2024

be-marc commented Dec 20, 2024

sebffischer commented Dec 20, 2024

be-marc commented Dec 20, 2024

sebffischer commented Dec 20, 2024

Feature Request: Out-Of-Bag Error should not require storing the model. #1158

Feature Request: Out-Of-Bag Error should not require storing the model. #1158

Comments

sebffischer commented Sep 3, 2024

be-marc commented Dec 19, 2024 • edited Loading

sebffischer commented Dec 19, 2024

be-marc commented Dec 19, 2024

be-marc commented Dec 20, 2024

sebffischer commented Dec 20, 2024

be-marc commented Dec 20, 2024

sebffischer commented Dec 20, 2024

be-marc commented Dec 19, 2024 •

edited

Loading