You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When fitting models with a large number of variables, Lasso.jl and GLMnet return different paths, and the difference grows bigger as the number of variables is bigger.
An example to illustrate this:
using Lasso, GLMNet, Statistics
# fits identical models in Lasso and GLMNet from mock data# and returns the mean absolute difference of the betas of both modelsfunctionlasso_glmnet_dif(nrow, ncol, n_col_contributing)
data =rand(nrow, ncol)
outcome =mean(data[:, 1:n_col_contributing], dims =1)[:,1] .>rand(nrow)
presence_matrix = [1.- outcome outcome]
l = Lasso.fit(LassoPath, data, outcome, Binomial())
g = GLMNet.glmnet(data, presence_matrix, Binomial())
lcoefs =Vector(l.coefs[:,end])
gcoefs = g.betas[:, end]
mean(abs, lcoefs .- gcoefs)
end# 1000 records, 5 variables that all contribute to outcomelasso_glmnet_dif(1000, 5, 5) # order of magnitude 1e-9# 1000 records, 100 variables of which 5 contribute to the outcomelasso_glmnet_dif(1000, 1000, 5) # around 0.05
The context for this problem is that I'm working on a julia implementation of maxnet, where a big-ish model matrix is generated (100s of columns) and a lasso path is used to select the most important ones.
The text was updated successfully, but these errors were encountered:
The packages may be generating a different path of regularization lambdas.
You can get them from the lasso path as l.λ, and from GLMNet as g.lambda.
Also, in your example it looks like you are picking the last coefs of the regularization path, but those are not necessarily the most interesting ones. Take a look at the docs here.
I don't think the regularization lambdas is where the differences are coming from. In the maxnet algorithm the lambdas are generated inside the algorithm and not by the packages. Even if I force the lambdas to be identical, I see the same kind of behaviour.
The same goes for when I look at some other part of the regularization path (maxnet always takes the last one, but I see your point).
E.g. in this example I take coefficients halfway in the path and force the lambdas to be identical, and lasso_glmnet_dif(1000, 1000, 5) is still around 0.02.
That seems like a big difference for it to come from floating point errors, which leads me to think the algorithms are somehow different?
When fitting models with a large number of variables, Lasso.jl and GLMnet return different paths, and the difference grows bigger as the number of variables is bigger.
An example to illustrate this:
The context for this problem is that I'm working on a julia implementation of maxnet, where a big-ish model matrix is generated (100s of columns) and a lasso path is used to select the most important ones.
The text was updated successfully, but these errors were encountered: