Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QR decomposition method in GLM. #507

Merged
merged 142 commits into from
May 4, 2023
Merged

QR decomposition method in GLM. #507

merged 142 commits into from
May 4, 2023

Conversation

mousum-github
Copy link
Collaborator

@mousum-github mousum-github commented Oct 29, 2022

It is well known that the Cholesky decomposition is fast but numerically unstable for ill-conditioned design/model matrix. Also, some users prefer the QR decomposition method in the linear (or generalized linear) model.

In this PR, we have added a keyword argument method in the lm, glm and related functions to control which decomposition method will be used. The method argument takes two values :cholesky (the default value and it uses Cholesky) and :qr (it uses QR).

Using QR decomposition in lm:
Issue #426
lm(model, data; method=:qr)

Coefficients:

  • Intercept: -2.18759e6
  • Year: 1096.15
  • Mileage: -0.0238368

Loglikelihood: -917.7449640609644
r2: 0.46431799718947153
adjr2: 0.454016420212346

Issue #375
lm(@formula(y ~ x), df; method=:qr, dropcollinear=false)

Coefficients:

  • Intercept: 65.7485
  • x: 9.94455

Summary of additions/changes:

  • DensePredQR - now supports both pivoted and unpivoted QR decomposition
  • invqr - compute the inverse of X’WX from QR decomposition
  • inverse - a wrapper of invqr and invchol, used inside the vcov function.
  • delbeta!(p::DensePredQR{T,<:QRCompactWY}, r::Vector{T}, wt::Vector{T})
  • delbeta!(p::DensePredQR{T,<:QRPivoted}, r::Vector{T})
  • delbeta!(p::DensePredQR{T,<:QRPivoted}, r::Vector{T}, wt::Vector{T})
  • Added dropcollinear and method keyword arguments in negbin function
  • All test cases related to the linear model (lm) are replicated for Cholesky and QR decompositions
  • All test cases related to the glm are replicated for Cholesky and QR decompositions
  • Added one example of a linear model with the QR method.

To check and compare the performance of GLM with the QR method, we performed a logistic regression with 3,481,280 rows × 24 columns. Logistic regression took approximately (Intel i7 11th generation with 16GM RAM, Windows 11 OS),

  • 56 seconds in R GLM
  • 10 seconds in Julia GLM with Cholesky
  • 30 seconds in Julia GLM with QR

Finally, @harsharora21 is one of the main contributors to this PR.

Mousum and others added 30 commits April 12, 2022 21:34
Adding PowerLink link function in GLM
Test was failing in Julia Nightly as: 

1) GLM.linkinv(InverseLink(), 10) was 0.01 while GLM.linkinv(PowerLink(-1), 10) was 0.010000000000000002
2) GLM.linkinv(InverseSquareLink(), 10) was -10.01 while GLM.linkinv(PowerLink(-2), 10) was -0.010000000000000002

Rounding off to 2 digits should solve this. 

Note: These tests were passing in other versions of Julia.
Correcting order of PowerLink, ProbitLink, SqrtLink.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Using ≈ instead of isapprox.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Using alphabetical order in references.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Writing the example description in plain text.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Removing extra space to be consistent with the style.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Using a better variable name for 1 by lambda.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Using inline function.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Making the doctest code concise.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Removing the R code.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Removing test of hashes.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
…few more metrics to test, also replaced all `=` by `≈` for real numbers
Since we are storing the link - the `GLM.Link` function can be defined uniformly for all link functions

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Changing the sentence to make it compact

Co-authored-by: Milan Bouchet-Valat <[email protected]>
…er after d (distribution) and changed in corresponding constructor, remove updateμ! function for PowerLink and modified general updateμ! function.
@mousum-github mousum-github requested a review from bkamins April 21, 2023 09:48
@bkamins
Copy link
Contributor

bkamins commented Apr 21, 2023

@mousum-github - I can review this PR, but please note that I am not a maintainer of GLM.jl, so someone from the team working with this package should approve and merge this PR.

docs/src/examples.md Outdated Show resolved Hide resolved
docs/src/examples.md Outdated Show resolved Hide resolved
src/linpred.jl Outdated Show resolved Hide resolved
src/linpred.jl Outdated Show resolved Hide resolved
src/linpred.jl Outdated Show resolved Hide resolved
docs/src/examples.md Outdated Show resolved Hide resolved
docs/src/examples.md Outdated Show resolved Hide resolved
Co-authored-by: Bogumił Kamiński <[email protected]>
@andreasnoack andreasnoack merged commit a8016bd into JuliaStats:master May 4, 2023
@palday palday mentioned this pull request Jun 7, 2023
palday pushed a commit that referenced this pull request Jun 7, 2023
backport of #507

---------

Co-authored-by: Mousum <[email protected]>
Co-authored-by: ayushpatnaikgit <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Ayush Patnaik <[email protected]>
Co-authored-by: harsharora21 <[email protected]>
Co-authored-by: Bogumił Kamiński <[email protected]>
(cherry picked from commit a8016bd)
ararslan pushed a commit that referenced this pull request Jun 7, 2023
* PowerLink refers to a class of techniques that use a power function

* added a few more testcases

* Fixing doctests for PowerLink

* Making subheading smaller than heading.

* Rounding off a test case result to 2 digits

Test was failing in Julia Nightly as:

1) GLM.linkinv(InverseLink(), 10) was 0.01 while GLM.linkinv(PowerLink(-1), 10) was 0.010000000000000002
2) GLM.linkinv(InverseSquareLink(), 10) was -10.01 while GLM.linkinv(PowerLink(-2), 10) was -0.010000000000000002

Rounding off to 2 digits should solve this.

Note: These tests were passing in other versions of Julia.

* adding Optim dependancy to docs

* No need to have optim as a dependancy for the entire package. Only need it for the docs

* Rounding off GLM.linkinv(PowerLink(-1), 10) and GLM.linkinv(PowerLink(-2), 10 to 2 digits

* Rounding off GLM.mueta(PowerLink(-1), 10) to make the test work on Julia Nightly

* Using inexact equality comparison instead of rounding.

* Apply suggestions from code review

Correcting order of PowerLink, ProbitLink, SqrtLink.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Apply suggestions from code review

Using ≈ instead of isapprox.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Apply suggestions from code review

Using alphabetical order in references.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update docs/src/examples.md

Writing the example description in plain text.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update docs/src/examples.md

Removing extra space to be consistent with the style.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Adding the missing comma.

* Update src/glmtools.jl

Using a better variable name for 1 by lambda.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/glmtools.jl

Using inline function.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Apply suggestions from code review

Making the doctest code concise.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Follwing alphabetical order in references

* Adding suggested changes by nalimilan

* Update test/runtests.jl

Removing the R code.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update test/runtests.jl

Removing test of hashes.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* replaced `isapprox` function by `≈` symbol whereever possible, added few more metrics to test, also replaced all `=` by `≈` for real numbers

* Update src/glmfit.jl

Since we are storing the link - the `GLM.Link` function can be defined uniformly for all link functions

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/glmfit.jl

Changing the sentence to make it compact

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* updated test cases, added a few more metrics to check, move link member after d (distribution) and changed in corresponding constructor, remove updateμ! function for PowerLink and modified general updateμ! function.

* Rephrased the description of PowerLink

* Update docs/src/examples.md

Adding break line and removing full stop

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Removed `GLM.Link(mm::AbstractGLM) = mm.l`, redefine defination of PowerLink and updated test cases

* update test case

* Update src/glmtools.jl

The suggestion is committed.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* changed the `inverselink` function for PowerLink as suggested in PR

* Add files via upload

NIST datasets for testing linear models

* added qrpred

* use qrpred instead of cholpred

* removed tests which use chol

* added qr decomp with weights

* added dropcollinear for qr decomp

* added pivot cholesky for pivot qr

* added dof for qr

* added chol for qr

* updated test cases related to QR method

* added float conversion

* fixed test

* updated QR decompositions

* removed redundant functions

* updating documention

* fixed bug

* refactored DensePredQR constructor

* changed DensePredQR constructor to take Abstract types

* removed unused functions

* removed unused functions

* updated doc string

* conditional pivoting in QR

* conditional QR pivoting

* updated example of Lineam model with QR decomposotion

* removed nasty.csv file and commented code

* Update src/GLM.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/lm.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* some changes suggested by @nalimilan

* Changed pivoted_qr to pivoted_qr!

* Changed :stable to :qr and :fast to  cholesky

* Changed in predict function in lm for qr method, also added some testcases

* re arranged all tests related to linear models (Cholesky and QR

* An intermidiate commit after merging with master branch

* without predict with qr test case. need to re-write predict function in lm

* updated example.md file

* Added test cases with predictions for QR method

* Removed some comments

* removed comments, replace Rsub\I by inv(Rsub)

* Update test/runtests.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update test/runtests.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update test/runtests.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update test/runtests.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/GLM.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/lm.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/lm.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/lm.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/lm.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update test/runtests.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* updated test cases, qrpred method

* removed extra lines in lm file

* updated predict! function in lm in compact form

* Optimizing the performnce for full rank matrix

* changed outer constructor to inner

* Incorporated QR decomposition method in glm related functions

* NegativeBinomial Parameter estimation test is failing in a few system. Trying a different minstepfac value

* NegativeBinomial Parameter estimation test is failing in a few system. Trying another smaller different minstepfac value

* NegativeBinomial Parameter estimation test is failing in a few systemse

* added one more example to show when QR method works better

* update for doc test failure

* Fixing doc test

* Fixing doc test failure

* Fixing doc test failure

* Fixing for doc test failure

* updated example for :qr method

* Updated example for QR decomposition

* Updated example for QR decomposition

* Updated example for QR decomposition

* Updated  instead of generic error

* Trying to resolve the conflict

* Updated example to show QR performs better

* Update src/GLM.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Update src/linpred.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* Updated test cases. Test cases with cholesky and qr methods are written in compact form

* Updated test case due to different pivot in different systems

* replaced equality check to egality check in glmfit

* updated example and added doctest

* updated example without doctest

* Avoiding multiple copyies of design matrix

* added one more example in qr vs cholesky

* updated example, remove unused constructor and added one more test case

* updated example to pass doctest

* updated example to pass doctest

* updated example to pass doctest

* updated example to pass doctest

* example

* updated example with rdchem data

* Update docs/src/examples.md

Co-authored-by: Bogumił Kamiński <[email protected]>

---------

Co-authored-by: Mousum <[email protected]>
Co-authored-by: ayushpatnaikgit <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Ayush Patnaik <[email protected]>
Co-authored-by: harsharora21 <[email protected]>
Co-authored-by: Bogumił Kamiński <[email protected]>
(cherry picked from commit a8016bd)
@nalimilan nalimilan deleted the lm-qr branch June 7, 2023 21:09
@droodman
Copy link

droodman commented Nov 27, 2023

As a user who is getting wrong results where R and Stata get it right, I think this is a fantastic addition.
One suggestion: rename the method option to something less generic like invmethod (as in inversion method). Consider that FixedEffectModels.jl has a method option, also maybe too generically named, to control whether the CPU or GPU is used. I think that FixedEffectModels.jl also should gain the ability to request the QR; what should that option be called?

@ararslan
Copy link
Member

For whatever it's worth, I think the use of method to choose the solution method is more appropriate than using method to choose where a computation is performed. The method keyword here in GLM.jl is the same as in R's lm. I don't know whether that was the motivator behind the name but it is pleasantly discoverable for folks coming from R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.