Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector Generalized Linear and Additive Models (Support for Multinomial) #206

Open
Nosferican opened this issue Dec 15, 2017 · 27 comments
Open

Comments

@Nosferican
Copy link
Contributor

Looking at the common generalized linear models, the binomial distribution is implemented, but the multinomial (its general form) is not. It also seems that GLM.LinPred is constrained to be a vector. The multinomial form for GLM is here. I would think it would be a good addition to the package.

@Nosferican
Copy link
Contributor Author

Nosferican commented Apr 29, 2018

@Nosferican Nosferican changed the title Support for Multinomial Vector Generalized Linear and Additive Models (Support for Multinomial) Apr 29, 2018
@Nosferican
Copy link
Contributor Author

Here is a draft that I am working on. If I can get the last kinks solved I could port it to GLM. Any help with the last kinks would be appreciated.
https://gist.github.com/Nosferican/54727b20f870894a15ecfb28e45cc4bc

@hung-q-ngo
Copy link
Contributor

@Nosferican what are the last kinks? I'm learning Julia, and have great interest in GLMs and their generalization, so I'd love to help out if my speed doesn't interfere with your schedule

@Nosferican
Copy link
Contributor Author

Nosferican commented May 7, 2018

I updated the gist... basically I need to check how the variance covariance is computed for the mlogit case.

@hung-q-ngo
Copy link
Contributor

@Nosferican juding from your comment here, ordinal LR is not yet supported, right? Because you mentioned something about isordered for LogitLink which I couldn't parse.

@Nosferican
Copy link
Contributor Author

That's correct. I haven't gotten to ordinal logistic regression yet. I decided to bundle the ones I have and implement ologit later on (with GMM instrumental variables for non-linear models). Getting my third chapter stuff together for proposal defense (2SLS, absorb / within / fixed effects, between, first difference, random effects for panel data, etc.)

@Nosferican
Copy link
Contributor Author

Fixed the vcov issue so now will be working on getting everything together. Hopefully I can get a beta in the nearby future. Will update the gist with the latest code for multinomial which can be used to port it to GLM.

@hung-q-ngo
Copy link
Contributor

hung-q-ngo commented May 14, 2018

@Nosferican : i've read enough (about VGLM & Julia) to understand your code. Is the one in gist the latest version? Do you need help with anything else? In addition to multinomial GLM, is multivariate Gaussian also a use-case / test-case?

@Nosferican
Copy link
Contributor Author

I finished pinning down the last few things finished writing the chapter for my dissertation on it so now is all bundling everything together. For the multivariate Gaussian question, for vglm basically one can specify a different distribution and link for each response / linear predictor in the most general case. I think it could be a sensible default to have it dispatch on the response type. The ones I did where the most used cases, but once it gets ported here one could make the general set up for other less common cases.

@hung-q-ngo
Copy link
Contributor

@Nosferican : any update on your vector GLM implementation? I am thinking of working on ordinal regression and that would benefit from VGLM tremendously

@Nosferican
Copy link
Contributor Author

Nosferican commented Jun 22, 2018

Hey, so I have the code working on alpha. CategoricalArrays had an issue which was patched yesterday and this morning StatsModels released a patch to support 0.7. I ran the code I had with the tagged versions and everything is working. I still need to clean up the API and add the other components, but hopefully I will get to it this weekend (depending on how much time I have). If you wanna contribute a ologit implementation that would be welcomed. I can take the code and put it in the same framework. Will keep you updated.

@Nosferican
Copy link
Contributor Author

I just committed a first draft of the components for Vectorized GLM you can take a look at. Any comments are appreciated. I will keep brining all the code I have developed together and compatible with the latest version. https://github.com/JuliaEconometrics/Econometrics.jl/blob/master/src/GeneralizedLinearModels.jl

@Nosferican
Copy link
Contributor Author

Nosferican commented Jul 2, 2018

Just to keep w/the updates. You should be able to start using it and identifying some issues for multinomial logistic regression now... documentation and tests coming soon... You can run it using nightly (you might need to checkout Distributions ] add Distributions#aa/0.7)

] add https://github.com/JuliaEconometrics/Econometrics.jl
using CSV, DataFrames, StatsBase, StatsModels, Econometrics
data = CSV.read("filename_to_test.csv"); # outcome variable should be `AbstractCategoricalVector`
formula = @formula(outcome ~ exogenous_variables);
model = EconometricsModel(formula, data);
coeftable(model)

Linear and Poisson are working as well... The whole StatsBase API is using basic rules... I will re-work these after merging the panel data estimators.

@hung-q-ngo
Copy link
Contributor

got it. I'm busy with things at work for a few days, will get back to reading/trying this out as soon as I can. Thanks for the update.

@JockLawrie
Copy link
Contributor

Hi there, I'm interested in taking this for a spin in Julia 0.7.

I run this code:

using DataFrames
using StatsBase
using StatsModels
using Econometrics

And get this error:

[ Info: Precompiling Econometrics [3a2a89cb-daa6-4aaa-96ef-7853daeb1b7c]
┌ Warning: Package Econometrics does not have DataFrames in its dependencies:- If you have Econometrics checked out for development and have
│   added DataFrames as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with Econometrics
└ Loading DataFrames into Econometrics from project dependency, future warnings for Econometrics are suppressed.
WARNING: Method definition stderror(StatsBase.StatisticalModel) in module StatsBase at /home/jock/.julia/packages/StatsBase/NzjNi/src/statmodels.jl:125 overwritten in module Econometrics at /home/jock/.julia/packages/Econometrics/y4Nin/src/GeneralizedLinearModels.jl:603.
ERROR: LoadError: LoadError: syntax: invalid assignment location "model_distribution(model) <: Multinomial && varlist[:response]"
Stacktrace:
 [1] include at ./boot.jl:317 [inlined]
 [2] include_relative(::Module, ::String) at ./loading.jl:1038
 [3] _broadcast_getindex at ./sysimg.jl:29 [inlined]
 [4] #17 at ./broadcast.jl:922 [inlined]
 [5] ntuple at ./tuple.jl:158 [inlined]
 [6] tuplebroadcast at ./broadcast.jl:922 [inlined]
 [7] copy at ./broadcast.jl:920 [inlined]
 [8] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple},Nothing,typeof(Econometrics.include),Tuple{Tuple{String,String,String}}}) at ./broadcast.jl:724
 [9] top-level scope at none:0
 [10] include at ./boot.jl:317 [inlined]
 [11] include_relative(::Module, ::String) at ./loading.jl:1038
 [12] include(::Module, ::String) at ./sysimg.jl:29
 [13] top-level scope at none:2
 [14] eval at ./boot.jl:319 [inlined]
 [15] eval(::Expr) at ./client.jl:399
 [16] top-level scope at ./none:3
in expression starting at /home/jock/.julia/packages/Econometrics/y4Nin/src/EconometricsModel.jl:1
in expression starting at /home/jock/.julia/packages/Econometrics/y4Nin/src/Econometrics.jl:33

Any ideas?

@Nosferican
Copy link
Contributor Author

stderr became Base from the (I/O) connections standpoint (it used to be upper case). StatsBase changed to stderror. I can probably fix it in a few minutes to bring it to 1.0 compatibility. Will ping you in a bit.

@JockLawrie
Copy link
Contributor

Great, thanks (and thanks for the speedy response!)

@Nosferican
Copy link
Contributor Author

Nosferican commented Sep 13, 2018

Try it now (use v"1.0.0", but v"0.7" will work too)

]rm Econometrics
]add https://github.com/JuliaEconometrics/Econometrics.jl#master
using Econometrics

Is still WIP so definitely not production ready, but feedback would be great. I am reexporting StatsBase, DataFrames, and StatsModels so no need to using those anymore.
You can use it by

model = EconometricsModel(::Formula, ::AbstractDataFrame;
                          contrasts::Dict{Symbol,AbstractContrasts} =
                              Dict{Symbol,AbstractContrasts}())
coeftable(model) # Most of the StatsBase API is implemented

For multinomial logistic regression, just make sure the response is AbstractCategoricalVector (it can be Union with Missing), (i.e., categorical!(data, response))

@JockLawrie
Copy link
Contributor

Thanks, that works.
I've posted some findings over at Econometrics.jl#1

@RossBoylan
Copy link

Where is the associated code? I don't see an Econometrics package under https://github.com/JuliaEconometrics.

@Nosferican
Copy link
Contributor Author

For now, a draft is at, https://github.com/Nosferican/Econometrics.jl. Waiting for StatsModels to tag a release and should be releasing a beta soon after.

@Tokazama
Copy link

Are there plans to move ordinal/multinomial functionality into GLM.jl or is it staying in Econometrics?

@Nosferican
Copy link
Contributor Author

I don't know if there are plans to back-port those. For mlogit, GLM would have to refactor some of the code for allowing VGLM. For ologit (polr), I am would have to see if I can finally get the analytical solution to the Hessian or it would need to introduce a dependency on some solver (e.g., Optim / NLSolver) for the Hessian.

@Tokazama
Copy link

Is the analytic solution to the Hessian the internal issue or is it related to parsing the categorical variables in the formula? If it's a formula related thing this may be worth pursuing because we should really have reasonably consistent behavior from packages using StatsModels

@Nosferican
Copy link
Contributor Author

It's just a beast of an analytical solution... I gave up last time after a month or trying to implement it everyday. I found yet another dissertation that has the closed for solution so I will give it a shot again when I have time this month.

@Tokazama
Copy link

I'm definitely not the one who should be implementing it but feel free to ping me when you need someone to test it or review code.

@frankier
Copy link

frankier commented Nov 27, 2023

It's just a beast of an analytical solution... I gave up last time after a month or trying to implement it everyday. I found yet another dissertation that has the closed for solution so I will give it a shot again when I have time this month.

@Nosferican Any chance you still have the reference handy? I'm not sure if I will take a crack at this, but at interested in getting an idea of how difficult it might be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants