`QR` decomposition method in GLM. #507

mousum-github · 2022-10-29T10:40:33Z

It is well known that the Cholesky decomposition is fast but numerically unstable for ill-conditioned design/model matrix. Also, some users prefer the QR decomposition method in the linear (or generalized linear) model.

In this PR, we have added a keyword argument method in the lm, glm and related functions to control which decomposition method will be used. The method argument takes two values :cholesky (the default value and it uses Cholesky) and :qr (it uses QR).

Using QR decomposition in lm:
Issue #426
lm(model, data; method=:qr)

Coefficients:

Intercept: -2.18759e6
Year: 1096.15
Mileage: -0.0238368

Loglikelihood: -917.7449640609644
r2: 0.46431799718947153
adjr2: 0.454016420212346

Issue #375
lm(@formula(y ~ x), df; method=:qr, dropcollinear=false)

Coefficients:

Intercept: 65.7485
x: 9.94455

Summary of additions/changes:

DensePredQR - now supports both pivoted and unpivoted QR decomposition
invqr - compute the inverse of X’WX from QR decomposition
inverse - a wrapper of invqr and invchol, used inside the vcov function.
delbeta!(p::DensePredQR{T,<:QRCompactWY}, r::Vector{T}, wt::Vector{T})
delbeta!(p::DensePredQR{T,<:QRPivoted}, r::Vector{T})
delbeta!(p::DensePredQR{T,<:QRPivoted}, r::Vector{T}, wt::Vector{T})
Added dropcollinear and method keyword arguments in negbin function
All test cases related to the linear model (lm) are replicated for Cholesky and QR decompositions
All test cases related to the glm are replicated for Cholesky and QR decompositions
Added one example of a linear model with the QR method.

To check and compare the performance of GLM with the QR method, we performed a logistic regression with 3,481,280 rows × 24 columns. Logistic regression took approximately (Intel i7 11th generation with 16GM RAM, Windows 11 OS),

56 seconds in R GLM
10 seconds in Julia GLM with Cholesky
30 seconds in Julia GLM with QR

Finally, @harsharora21 is one of the main contributors to this PR.

Adding PowerLink link function in GLM

Test was failing in Julia Nightly as: 1) GLM.linkinv(InverseLink(), 10) was 0.01 while GLM.linkinv(PowerLink(-1), 10) was 0.010000000000000002 2) GLM.linkinv(InverseSquareLink(), 10) was -10.01 while GLM.linkinv(PowerLink(-2), 10) was -0.010000000000000002 Rounding off to 2 digits should solve this. Note: These tests were passing in other versions of Julia.

…ed it for the docs

…(-2), 10 to 2 digits

…lia Nightly

Correcting order of PowerLink, ProbitLink, SqrtLink. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Using ≈ instead of isapprox. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Using alphabetical order in references. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Writing the example description in plain text. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Removing extra space to be consistent with the style. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Using a better variable name for 1 by lambda. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Using inline function. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Making the doctest code concise. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Removing the R code. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Removing test of hashes. Co-authored-by: Milan Bouchet-Valat <[email protected]>

…few more metrics to test, also replaced all `=` by `≈` for real numbers

Since we are storing the link - the `GLM.Link` function can be defined uniformly for all link functions Co-authored-by: Milan Bouchet-Valat <[email protected]>

Changing the sentence to make it compact Co-authored-by: Milan Bouchet-Valat <[email protected]>

…er after d (distribution) and changed in corresponding constructor, remove updateμ! function for PowerLink and modified general updateμ! function.

bkamins · 2023-04-21T18:02:44Z

@mousum-github - I can review this PR, but please note that I am not a maintainer of GLM.jl, so someone from the team working with this package should approve and merge this PR.

docs/src/examples.md

src/linpred.jl

docs/src/examples.md

Co-authored-by: Bogumił Kamiński <[email protected]>

backport of #507 --------- Co-authored-by: Mousum <[email protected]> Co-authored-by: ayushpatnaikgit <[email protected]> Co-authored-by: Milan Bouchet-Valat <[email protected]> Co-authored-by: Ayush Patnaik <[email protected]> Co-authored-by: harsharora21 <[email protected]> Co-authored-by: Bogumił Kamiński <[email protected]> (cherry picked from commit a8016bd)

@nalimilan

* PowerLink refers to a class of techniques that use a power function * added a few more testcases * Fixing doctests for PowerLink * Making subheading smaller than heading. * Rounding off a test case result to 2 digits Test was failing in Julia Nightly as: 1) GLM.linkinv(InverseLink(), 10) was 0.01 while GLM.linkinv(PowerLink(-1), 10) was 0.010000000000000002 2) GLM.linkinv(InverseSquareLink(), 10) was -10.01 while GLM.linkinv(PowerLink(-2), 10) was -0.010000000000000002 Rounding off to 2 digits should solve this. Note: These tests were passing in other versions of Julia. * adding Optim dependancy to docs * No need to have optim as a dependancy for the entire package. Only need it for the docs * Rounding off GLM.linkinv(PowerLink(-1), 10) and GLM.linkinv(PowerLink(-2), 10 to 2 digits * Rounding off GLM.mueta(PowerLink(-1), 10) to make the test work on Julia Nightly * Using inexact equality comparison instead of rounding. * Apply suggestions from code review Correcting order of PowerLink, ProbitLink, SqrtLink. Co-authored-by: Milan Bouchet-Valat <[email protected]> * Apply suggestions from code review Using ≈ instead of isapprox. Co-authored-by: Milan Bouchet-Valat <[email protected]> * Apply suggestions from code review Using alphabetical order in references. Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update docs/src/examples.md Writing the example description in plain text. Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update docs/src/examples.md Removing extra space to be consistent with the style. Co-authored-by: Milan Bouchet-Valat <[email protected]> * Adding the missing comma. * Update src/glmtools.jl Using a better variable name for 1 by lambda. Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/glmtools.jl Using inline function. Co-authored-by: Milan Bouchet-Valat <[email protected]> * Apply suggestions from code review Making the doctest code concise. Co-authored-by: Milan Bouchet-Valat <[email protected]> * Follwing alphabetical order in references * Adding suggested changes by nalimilan * Update test/runtests.jl Removing the R code. Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update test/runtests.jl Removing test of hashes. Co-authored-by: Milan Bouchet-Valat <[email protected]> * replaced `isapprox` function by `≈` symbol whereever possible, added few more metrics to test, also replaced all `=` by `≈` for real numbers * Update src/glmfit.jl Since we are storing the link - the `GLM.Link` function can be defined uniformly for all link functions Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/glmfit.jl Changing the sentence to make it compact Co-authored-by: Milan Bouchet-Valat <[email protected]> * updated test cases, added a few more metrics to check, move link member after d (distribution) and changed in corresponding constructor, remove updateμ! function for PowerLink and modified general updateμ! function. * Rephrased the description of PowerLink * Update docs/src/examples.md Adding break line and removing full stop Co-authored-by: Milan Bouchet-Valat <[email protected]> * Removed `GLM.Link(mm::AbstractGLM) = mm.l`, redefine defination of PowerLink and updated test cases * update test case * Update src/glmtools.jl The suggestion is committed. Co-authored-by: Milan Bouchet-Valat <[email protected]> * changed the `inverselink` function for PowerLink as suggested in PR * Add files via upload NIST datasets for testing linear models * added qrpred * use qrpred instead of cholpred * removed tests which use chol * added qr decomp with weights * added dropcollinear for qr decomp * added pivot cholesky for pivot qr * added dof for qr * added chol for qr * updated test cases related to QR method * added float conversion * fixed test * updated QR decompositions * removed redundant functions * updating documention * fixed bug * refactored DensePredQR constructor * changed DensePredQR constructor to take Abstract types * removed unused functions * removed unused functions * updated doc string * conditional pivoting in QR * conditional QR pivoting * updated example of Lineam model with QR decomposotion * removed nasty.csv file and commented code * Update src/GLM.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * some changes suggested by @nalimilan * Changed pivoted_qr to pivoted_qr! * Changed :stable to :qr and :fast to cholesky * Changed in predict function in lm for qr method, also added some testcases * re arranged all tests related to linear models (Cholesky and QR * An intermidiate commit after merging with master branch * without predict with qr test case. need to re-write predict function in lm * updated example.md file * Added test cases with predictions for QR method * Removed some comments * removed comments, replace Rsub\I by inv(Rsub) * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/GLM.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * updated test cases, qrpred method * removed extra lines in lm file * updated predict! function in lm in compact form * Optimizing the performnce for full rank matrix * changed outer constructor to inner * Incorporated QR decomposition method in glm related functions * NegativeBinomial Parameter estimation test is failing in a few system. Trying a different minstepfac value * NegativeBinomial Parameter estimation test is failing in a few system. Trying another smaller different minstepfac value * NegativeBinomial Parameter estimation test is failing in a few systemse * added one more example to show when QR method works better * update for doc test failure * Fixing doc test * Fixing doc test failure * Fixing doc test failure * Fixing for doc test failure * updated example for :qr method * Updated example for QR decomposition * Updated example for QR decomposition * Updated example for QR decomposition * Updated instead of generic error * Trying to resolve the conflict * Updated example to show QR performs better * Update src/GLM.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * Updated test cases. Test cases with cholesky and qr methods are written in compact form * Updated test case due to different pivot in different systems * replaced equality check to egality check in glmfit * updated example and added doctest * updated example without doctest * Avoiding multiple copyies of design matrix * added one more example in qr vs cholesky * updated example, remove unused constructor and added one more test case * updated example to pass doctest * updated example to pass doctest * updated example to pass doctest * updated example to pass doctest * example * updated example with rdchem data * Update docs/src/examples.md Co-authored-by: Bogumił Kamiński <[email protected]> --------- Co-authored-by: Mousum <[email protected]> Co-authored-by: ayushpatnaikgit <[email protected]> Co-authored-by: Milan Bouchet-Valat <[email protected]> Co-authored-by: Ayush Patnaik <[email protected]> Co-authored-by: harsharora21 <[email protected]> Co-authored-by: Bogumił Kamiński <[email protected]> (cherry picked from commit a8016bd)

droodman · 2023-11-27T20:22:54Z

As a user who is getting wrong results where R and Stata get it right, I think this is a fantastic addition.
One suggestion: rename the method option to something less generic like invmethod (as in inversion method). Consider that FixedEffectModels.jl has a method option, also maybe too generically named, to control whether the CPU or GPU is used. I think that FixedEffectModels.jl also should gain the ability to request the QR; what should that option be called?

ararslan · 2023-11-29T01:35:23Z

For whatever it's worth, I think the use of method to choose the solution method is more appropriate than using method to choose where a computation is performed. The method keyword here in GLM.jl is the same as in R's lm. I don't know whether that was the motivator behind the name but it is pleasantly discoverable for folks coming from R.

Mousum and others added 30 commits April 12, 2022 21:34

PowerLink refers to a class of techniques that use a power function

405fe93

added a few more testcases

742455f

Fixing doctests for PowerLink

f788854

Making subheading smaller than heading.

f65b842

Merge pull request #1 from xKDR/powerlink

6b80352

Adding PowerLink link function in GLM

adding Optim dependancy to docs

2e3b67a

No need to have optim as a dependancy for the entire package. Only ne…

5b09ae1

…ed it for the docs

Rounding off GLM.linkinv(PowerLink(-1), 10) and GLM.linkinv(PowerLink…

fddeebe

…(-2), 10 to 2 digits

Rounding off GLM.mueta(PowerLink(-1), 10) to make the test work on Ju…

e1e7631

…lia Nightly

Using inexact equality comparison instead of rounding.

b9ffe36

Apply suggestions from code review

ad4c986

Correcting order of PowerLink, ProbitLink, SqrtLink. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Apply suggestions from code review

f9dcc4c

Using ≈ instead of isapprox. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Apply suggestions from code review

9dae795

Using alphabetical order in references. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Update docs/src/examples.md

83e7b12

Writing the example description in plain text. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Update docs/src/examples.md

9d703a5

Removing extra space to be consistent with the style. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Adding the missing comma.

a0571a6

Update src/glmtools.jl

99aa23f

Using a better variable name for 1 by lambda. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Update src/glmtools.jl

60bcde3

Using inline function. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Apply suggestions from code review

f8c7809

Making the doctest code concise. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Follwing alphabetical order in references

aef3048

Adding suggested changes by nalimilan

93a607a

Update test/runtests.jl

9620cf9

Removing the R code. Co-authored-by: Milan Bouchet-Valat <[email protected]>

Update test/runtests.jl

0d94609

Removing test of hashes. Co-authored-by: Milan Bouchet-Valat <[email protected]>

replaced isapprox function by ≈ symbol whereever possible, added …

7943818

…few more metrics to test, also replaced all `=` by `≈` for real numbers

Update src/glmfit.jl

3cf14f4

Since we are storing the link - the `GLM.Link` function can be defined uniformly for all link functions Co-authored-by: Milan Bouchet-Valat <[email protected]>

Update src/glmfit.jl

12bdab6

Changing the sentence to make it compact Co-authored-by: Milan Bouchet-Valat <[email protected]>

updated test cases, added a few more metrics to check, move link memb…

bc80e7f

…er after d (distribution) and changed in corresponding constructor, remove updateμ! function for PowerLink and modified general updateμ! function.

Rephrased the description of PowerLink

daf91a0

Merge branch 'JuliaStats:master' into master

2138ece

mousum-github requested a review from bkamins April 21, 2023 09:48