Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: recipes for regression models. should close #290 #293

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mkborregaard
Copy link
Member

@mkborregaard mkborregaard commented Jan 30, 2020

Here's a basic example usage:

using StatsPlots, GLM, RDatasets
iris = dataset("datasets", "iris")
mod = lm(@formula(PetalLength ~ PetalWidth), iris)
plot(mod)
@df iris scatter!(:PetalWidth, :PetalLength, ms = 2, c = :black, legend = :topleft)

model_example

This is really rough and does only address the bivariate case - we're waiting for a general interface to StatsModels (will open an issue there)
cc @Tokazama

@daschw
Copy link
Member

daschw commented Jan 31, 2020

Nice.

@@ -0,0 +1,8 @@
# Note both predict and RegressionModel are defined in StatsBase
@recipe function f(mod::RegressionModel)
newx = [ones(200) range(extrema(mod.model.pp.X[:,2])..., length = 200)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that the second column is what we want to plot?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a good hack for now?

@recipe function f(mod::RegressionModel, name)
    newx = [ones(200) range(extrema(mod.model.pp.X[:,findfirst(==(name), coefnames(mod))])..., length = 200)]
....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the interface being name can be a symbol, but also a Tuple or Vector of symbols, length, and then the plot being bivariate or 3d depending on that, and the regression keeping all the other input variables at their mean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant that the second column isn't always going to be what we want to plot.

For example...

mod = lm(@formula(PetalLength ~ 0 + PetalWidth), iris)

would get rid of the intercept so the 2nd column wouldn't even exist. I was just trying to find something that would ensure the right coefficient is grabbed from the model matrix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I know and I agree, I was just trying to see if we could make it even more general

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would actually be really convenient to be able to add multiple coefficients and see how they are interacting visually! I think this would also make for a good long term user interface when the stats ecosystem will adopt a better way of extracting information from models using coefficient names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mod.model.pp.X is specific to GLM, you should rather use modelmatrix(mod) instead.

@ffevotte
Copy link

(X-ref : discourse)

Thanks for this feature!

I have been able to successfully use it on simple models. However, it still fails in more complex examples, such as the following:

using DataFrames
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ rand(100);

using GLM
model = lm(@formula(y ~ x + x^2), data)  # @formula(y ~ x) works well

using Plots; gr()
using StatsPlots
plot(xlabel="x", ylabel="y", legend=:bottomright)
plot!(data.x, data.y, label="data", seriestype=:scatter)
plot!(model)

for which I get the following error / stack trace:

julia> plot!(model)
ERROR: DimensionMismatch("second dimension of A, 2, does not match length of x, 3")
Stacktrace:
 [1] gemv!(::Array{Float64,1}, ::Char, ::Array{Float64,2}, ::Array{Float64,1}, ::Bo
ol, ::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.
4/LinearAlgebra/src/matmul.jl:456                                                 
 [2] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/
LinearAlgebra/src/matmul.jl:66 [inlined]                                          
 [3] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/
LinearAlgebra/src/matmul.jl:208 [inlined]                                         
 [4] * at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Lin
earAlgebra/src/matmul.jl:47 [inlined]                                             
 [5] predict(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,L
inearAlgebra.Cholesky{Float64,Array{Float64,2}}}}, ::Array{Float64,2}; interval::Sy
mbol, level::Float64) at /home/francois/.julia/packages/GLM/6V3fS/src/lm.jl:201   
 [6] #predict#79 at /home/francois/.julia/packages/StatsModels/pMxlJ/src/statsmodel
.jl:125 [inlined]                                                                 
 [7] macro expansion at /home/francois/.julia/packages/StatsPlots/0rYaj/src/statsmo
dels.jl:4 [inlined]                                                               
 [8] apply_recipe(::Dict{Symbol,Any}, ::StatsModels.TableRegressionModel{LinearMode
l{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Flo
at64,Array{Float64,2}}}},Array{Float64,2}}) at /home/francois/.julia/packages/Recip
esBase/G4s6f/src/RecipesBase.jl:279                                               
 [9] _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tupl
e{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.Den
sePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64
,2}}}) at /home/francois/.julia/packages/Plots/cc8wh/src/pipeline.jl:85           
 [10] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels
.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Fl
oat64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at /ho
me/francois/.julia/packages/Plots/cc8wh/src/plot.jl:178                           
 [11] #plot!#138 at /home/francois/.julia/packages/Plots/cc8wh/src/plot.jl:158 [inl
ined]                                                                             
 [12] plot!(::Plots.Plot{Plots.GRBackend}, ::StatsModels.TableRegressionModel{Linea
rModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesk
y{Float64,Array{Float64,2}}}},Array{Float64,2}}) at /home/francois/.julia/packages/
Plots/cc8wh/src/plot.jl:155                                                       
 [13] plot!(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64
,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},A
rray{Float64,2}}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tupl
e{}}}) at /home/francois/.julia/packages/Plots/cc8wh/src/plot.jl:150              
 [14] plot!(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64
,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},A
rray{Float64,2}}) at /home/francois/.julia/packages/Plots/cc8wh/src/plot.jl:144   
 [15] top-level scope at REPL[32]:1

Thanks anyway for StatsPlots, and please don't hesitate to tell me if I can help!

@hafez-ahmad
Copy link

I am getting error
plot!(model)
ERROR: Cannot convert StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}} to series data for plotting
Stacktrace:
[1] error(::String) at .\error.jl:33
[2] _prepare_series_data(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\series.jl:8
[3] _series_data_vector(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}, ::Dict{Symbol,Any}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\series.jl:27
[4] macro expansion at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\series.jl:139 [inlined]
[5] apply_recipe(::Dict{Symbol,Any}, ::Type{RecipesPipeline.SliceIt}, ::Nothing, ::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}, ::Nothing) at C:\Users\hafez.julia\packages\RecipesBase\AN696\src\RecipesBase.jl:282
[6] _process_userrecipes!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\user_recipe.jl:35
[7] recipe_pipeline!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\RecipesPipeline.jl:68
[8] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:167
[9] #plot!#127 at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:158 [inlined]
[10] plot!(::Plots.Plot{Plots.GRBackend}, ::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:155
[11] plot!(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:150
[12] plot!(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:144
[13] top-level scope at REPL[12]:1
[14] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants