Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: recipes for regression models. should close #290 #293

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,18 @@ plot(

![heatmap dendrogram optimal](https://user-images.githubusercontent.com/3502975/59949464-20778680-9441-11e9-8ed7-9a639b50dfb2.png)

## Plotting statistical model objects
So far there are only recipes for the bivariate case, but the plan is to make this work for all types of statistical models

```julia
using StatsPlots, GLM, RDatasets
iris = dataset("datasets", "iris")
mod = lm(@formula(PetalLength ~ PetalWidth), iris)
plot(mod)
@df iris scatter!(:PetalWidth, :PetalLength, ms = 2, c = :black, legend = :topleft)
```
![model_example](https://user-images.githubusercontent.com/8429802/73453356-1ff68500-436c-11ea-872c-03ae851be281.png)

## GroupedErrors.jl for population analysis

Population analysis on a table-like data structures can be done using the highly recommended [GroupedErrors](https://github.com/piever/GroupedErrors.jl) package.
Expand Down
1 change: 1 addition & 0 deletions src/StatsPlots.jl
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ include("ecdf.jl")
include("hist.jl")
include("marginalhist.jl")
include("bar.jl")
include("statsmodels.jl")
include("dendrogram.jl")
include("andrews.jl")
include("ordinations.jl")
Expand Down
8 changes: 8 additions & 0 deletions src/statsmodels.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Note both predict and RegressionModel are defined in StatsBase
@recipe function f(mod::RegressionModel)
newx = [ones(200) range(extrema(mod.model.pp.X[:,2])..., length = 200)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that the second column is what we want to plot?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a good hack for now?

@recipe function f(mod::RegressionModel, name)
    newx = [ones(200) range(extrema(mod.model.pp.X[:,findfirst(==(name), coefnames(mod))])..., length = 200)]
....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the interface being name can be a symbol, but also a Tuple or Vector of symbols, length, and then the plot being bivariate or 3d depending on that, and the regression keeping all the other input variables at their mean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant that the second column isn't always going to be what we want to plot.

For example...

mod = lm(@formula(PetalLength ~ 0 + PetalWidth), iris)

would get rid of the intercept so the 2nd column wouldn't even exist. I was just trying to find something that would ensure the right coefficient is grabbed from the model matrix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I know and I agree, I was just trying to see if we could make it even more general

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would actually be really convenient to be able to add multiple coefficients and see how they are interacting visually! I think this would also make for a good long term user interface when the stats ecosystem will adopt a better way of extracting information from models using coefficient names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mod.model.pp.X is specific to GLM, you should rather use modelmatrix(mod) instead.

newy, l, u = predict(mod, newx, interval = :confidence)
ribbon := (vec(u)-newy, newy-vec(l))
label --> "model"
newx[:,2], newy
end