You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Almond Stöcker und Tobi Kühn have started to write a function to extract the amount of risk reduction contributed by each base-learner from a fitted mboost model.
We'd like to see this included in mboost, with visualisation options, if possible, as it seems to answer a question that comes up frequently in postprocessing and interpreting boosting fits: which base-learners are the most important for the fit?
See code below.
# variable importance
RelRisk <- function( model, ohne1 = FALSE )
{
baselearner <- names(model$baselearner)
learner.type = sapply( strsplit(baselearner, "\\(") , "[[", 1)
learner.type.new = as.vector(sapply(learner.type, FUN = function(x)
switch(x, bols = "(linear)",
bbs = "(nonlinear)",
bspatial = "(spatial)")))
var.names = as.vector(variable.names(model))
baselearner.short = paste(var.names, learner.type.new)
n <- length(model$response)
# Which Baselearners were selected while boosting:
selected <- model$xselect()
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Emp. Risk:
# Initial Risk for the Intercept Model:
Risk0 <- with( model, family@risk( response, offset ) )
# Risk after the Boosting-Steps:
Risk <- model$risk()
# Risk loss per step:
RiskDif <- c(Risk0, Risk[-length(Risk)]) - Risk
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Falls gewuenscht, ohne den ersten Schritt:
if(ohne1)
{
RiskDif <- RiskDif[-1]
selected <- selected[-1]
}
# Explained Risk attributed to Baselearners
explained <- rep( 0, length(baselearner) )
for( i in 1:length(baselearner)) explained[i] <- sum( RiskDif[which(selected==i)] )
# Selection percentage of the baselearners
frequence <- rep(0, length(baselearner))
for( i in 1:length(baselearner)) frequence[i] <- mean( selected == i )
par(mar = c(5, 13, 4, 2) + 0.1 )
(b <- barplot(height = explained / n , names.arg = baselearner.short, las = 1, horiz = TRUE, main = paste("Endrisiko =", Risk[length(Risk)] / n) ) )
text(x = max(explained/n)/10, y = b, labels = frequence)
par(mar = c(5, 4, 4, 2) + 0.1 )
}
> cars.gb <- gamboost(dist ~ bols(speed) + bbs(speed, center=TRUE), data = cars,
+ control = boost_control(mstop = 50))
> RelRisk(cars.gb)
Any input on what such a function should or should not do would be highly appreciated!
The text was updated successfully, but these errors were encountered:
This does not work for binary response because of the risk implementation in mboost (note that factors are encoded with (-1,1)). Risk0 will be 0 and a negative importance is set for the first base-learner.
Almond Stöcker und Tobi Kühn have started to write a function to extract the amount of risk reduction contributed by each base-learner from a fitted
mboost
model.We'd like to see this included in mboost, with visualisation options, if possible, as it seems to answer a question that comes up frequently in postprocessing and interpreting boosting fits: which base-learners are the most important for the fit?
See code below.
Any input on what such a function should or should not do would be highly appreciated!
The text was updated successfully, but these errors were encountered: