Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QR decomposition method in GLM. #507

Merged
merged 142 commits into from
May 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
405fe93
PowerLink refers to a class of techniques that use a power function
Apr 12, 2022
742455f
added a few more testcases
Apr 13, 2022
f788854
Fixing doctests for PowerLink
ayushpatnaikgit Apr 15, 2022
f65b842
Making subheading smaller than heading.
ayushpatnaikgit Apr 15, 2022
6b80352
Merge pull request #1 from xKDR/powerlink
ayushpatnaikgit Apr 16, 2022
683d512
Rounding off a test case result to 2 digits
ayushpatnaikgit Apr 16, 2022
2e3b67a
adding Optim dependancy to docs
ayushpatnaikgit Apr 16, 2022
5b09ae1
No need to have optim as a dependancy for the entire package. Only ne…
ayushpatnaikgit Apr 16, 2022
fddeebe
Rounding off GLM.linkinv(PowerLink(-1), 10) and GLM.linkinv(PowerLink…
ayushpatnaikgit Apr 16, 2022
e1e7631
Rounding off GLM.mueta(PowerLink(-1), 10) to make the test work on Ju…
ayushpatnaikgit Apr 16, 2022
b9ffe36
Using inexact equality comparison instead of rounding.
ayushpatnaikgit Apr 17, 2022
ad4c986
Apply suggestions from code review
ayushpatnaikgit Apr 18, 2022
f9dcc4c
Apply suggestions from code review
ayushpatnaikgit Apr 18, 2022
9dae795
Apply suggestions from code review
ayushpatnaikgit Apr 18, 2022
83e7b12
Update docs/src/examples.md
ayushpatnaikgit Apr 18, 2022
9d703a5
Update docs/src/examples.md
ayushpatnaikgit Apr 18, 2022
a0571a6
Adding the missing comma.
ayushpatnaikgit Apr 18, 2022
99aa23f
Update src/glmtools.jl
ayushpatnaikgit Apr 18, 2022
60bcde3
Update src/glmtools.jl
ayushpatnaikgit Apr 18, 2022
f8c7809
Apply suggestions from code review
ayushpatnaikgit Apr 18, 2022
aef3048
Follwing alphabetical order in references
ayushpatnaikgit Apr 18, 2022
93a607a
Adding suggested changes by nalimilan
ayushpatnaikgit Apr 18, 2022
9620cf9
Update test/runtests.jl
ayushpatnaikgit Apr 18, 2022
0d94609
Update test/runtests.jl
ayushpatnaikgit Apr 18, 2022
7943818
replaced `isapprox` function by `≈` symbol whereever possible, added …
Apr 18, 2022
3cf14f4
Update src/glmfit.jl
mousum-github Apr 19, 2022
12bdab6
Update src/glmfit.jl
mousum-github Apr 19, 2022
bc80e7f
updated test cases, added a few more metrics to check, move link memb…
Apr 20, 2022
daf91a0
Rephrased the description of PowerLink
Apr 20, 2022
2138ece
Merge branch 'JuliaStats:master' into master
ayushpatnaikgit Apr 22, 2022
421eb3b
Update docs/src/examples.md
ayushpatnaikgit Apr 22, 2022
48983a6
Removed `GLM.Link(mm::AbstractGLM) = mm.l`, redefine defination of Po…
Apr 24, 2022
e32a2ef
Merge branch 'master' of https://github.com/xKDR/GLM.jl
Apr 24, 2022
634ece5
update test case
May 6, 2022
06d6a59
Update src/glmtools.jl
mousum-github May 20, 2022
6a3ceed
changed the `inverselink` function for PowerLink as suggested in PR
May 20, 2022
bda271a
Merge branch 'JuliaStats:master' into master
mousum-github Jun 28, 2022
2d7b8d4
Add files via upload
mousum-github Jul 14, 2022
bdcd53b
added qrpred
harsharora21 Aug 21, 2022
4c40d0a
use qrpred instead of cholpred
harsharora21 Aug 21, 2022
0b77b8f
removed tests which use chol
harsharora21 Aug 21, 2022
3789f93
added qr decomp with weights
harsharora21 Aug 23, 2022
4a1cce4
added dropcollinear for qr decomp
harsharora21 Aug 24, 2022
ec4e5e2
added pivot cholesky for pivot qr
harsharora21 Oct 3, 2022
6528746
added dof for qr
harsharora21 Oct 9, 2022
48e83c6
added chol for qr
harsharora21 Oct 9, 2022
d438cf8
updated test cases related to QR method
Oct 11, 2022
7c93a85
added float conversion
harsharora21 Oct 11, 2022
3774f22
fixed test
harsharora21 Oct 11, 2022
a3ef778
updated QR decompositions
Oct 14, 2022
c9dc567
removed redundant functions
Oct 18, 2022
035a5c6
Merge branch 'JuliaStats:master' into master
harsharora21 Oct 23, 2022
b60c98e
merge latest with lm-qr
harsharora21 Oct 23, 2022
49a5290
updating documention
Oct 27, 2022
f631dfe
fixed bug
harsharora21 Oct 28, 2022
56ab993
refactored DensePredQR constructor
harsharora21 Oct 28, 2022
5101b1a
changed DensePredQR constructor to take Abstract types
harsharora21 Oct 28, 2022
3cb08d4
removed unused functions
Oct 29, 2022
8bbc4ba
removed unused functions
Oct 29, 2022
875617a
updated doc string
Oct 29, 2022
6370373
conditional pivoting in QR
Oct 29, 2022
26cc5e4
conditional QR pivoting
Oct 29, 2022
9a1b09a
updated example of Lineam model with QR decomposotion
Oct 29, 2022
87a152b
removed nasty.csv file and commented code
Nov 4, 2022
391d7d1
Update src/GLM.jl
mousum-github Nov 14, 2022
8069931
Update src/linpred.jl
mousum-github Nov 14, 2022
7849e94
Update src/linpred.jl
mousum-github Nov 14, 2022
e6bd290
Update src/lm.jl
mousum-github Nov 14, 2022
0750e51
Update src/linpred.jl
mousum-github Nov 14, 2022
62772f9
Update src/linpred.jl
mousum-github Nov 14, 2022
5dea6ea
Update src/linpred.jl
mousum-github Nov 14, 2022
450c1e4
Update src/linpred.jl
mousum-github Nov 14, 2022
65abdb0
some changes suggested by @nalimilan
Nov 15, 2022
b51c637
Changed pivoted_qr to pivoted_qr!
Nov 15, 2022
b05e689
Changed :stable to :qr and :fast to cholesky
Nov 15, 2022
408e3e8
Changed in predict function in lm for qr method, also added some test…
Nov 22, 2022
67ac56a
re arranged all tests related to linear models (Cholesky and QR
Nov 24, 2022
d0e4ee9
Merge branch 'JuliaStats:master' into master
mousum-github Dec 8, 2022
a715b6a
Merge remote-tracking branch 'origin/master' into lm-qr
Dec 8, 2022
23a1a0a
An intermidiate commit after merging with master branch
Dec 8, 2022
e8d3fa2
An intermidiate commit after merging with master branch
Dec 8, 2022
15588be
without predict with qr test case. need to re-write predict function …
Dec 8, 2022
569a9e3
updated example.md file
Dec 8, 2022
6267b0c
Added test cases with predictions for QR method
Dec 9, 2022
9e664c7
Removed some comments
Dec 9, 2022
b902e3f
removed comments, replace Rsub\I by inv(Rsub)
Dec 12, 2022
b68a90f
Update test/runtests.jl
ayushpatnaikgit Dec 26, 2022
1972d72
Update test/runtests.jl
ayushpatnaikgit Dec 26, 2022
d0948dc
Update test/runtests.jl
ayushpatnaikgit Dec 26, 2022
6884081
Update test/runtests.jl
ayushpatnaikgit Dec 26, 2022
d485bdc
Update src/GLM.jl
mousum-github Jan 3, 2023
5faa21f
Update src/linpred.jl
mousum-github Jan 3, 2023
bbfc8f6
Update src/linpred.jl
mousum-github Jan 3, 2023
fbbca28
Update src/lm.jl
mousum-github Jan 3, 2023
20d9201
Update src/lm.jl
mousum-github Jan 3, 2023
689c988
Update src/lm.jl
mousum-github Jan 3, 2023
6cbbae8
Update src/linpred.jl
mousum-github Jan 3, 2023
c0223be
Update src/lm.jl
mousum-github Jan 3, 2023
b9db1e2
Update src/linpred.jl
mousum-github Jan 3, 2023
c050fea
Update src/linpred.jl
mousum-github Jan 3, 2023
b0896d6
Update src/linpred.jl
mousum-github Jan 3, 2023
2cac956
Update test/runtests.jl
mousum-github Jan 3, 2023
922f837
updated test cases, qrpred method
Jan 4, 2023
152e599
removed extra lines in lm file
Jan 4, 2023
68cb27c
updated predict! function in lm in compact form
Jan 4, 2023
438c3d1
Optimizing the performnce for full rank matrix
Jan 4, 2023
0f66978
changed outer constructor to inner
harsharora21 Jan 13, 2023
72599e4
Incorporated QR decomposition method in glm related functions
Jan 28, 2023
e1abfeb
NegativeBinomial Parameter estimation test is failing in a few system…
Jan 28, 2023
ab5198c
NegativeBinomial Parameter estimation test is failing in a few system…
Jan 28, 2023
7a26a06
NegativeBinomial Parameter estimation test is failing in a few systemse
Jan 28, 2023
727ef3d
added one more example to show when QR method works better
Jan 28, 2023
3db328e
update for doc test failure
Jan 29, 2023
d529789
Fixing doc test
mousum-github Jan 29, 2023
32880ab
Fixing doc test failure
mousum-github Jan 29, 2023
06af92d
Fixing doc test failure
mousum-github Jan 29, 2023
40c1105
Fixing for doc test failure
mousum-github Jan 29, 2023
3e29f79
updated example for :qr method
mousum-github Mar 27, 2023
973ebf8
Updated example for QR decomposition
mousum-github Apr 5, 2023
a25e188
Updated example for QR decomposition
mousum-github Apr 5, 2023
e321666
Updated example for QR decomposition
mousum-github Apr 5, 2023
837e9de
Updated instead of generic error
mousum-github Apr 5, 2023
68947c4
Trying to resolve the conflict
mousum-github Apr 5, 2023
3a3151f
Merge branch 'master' into lm-qr
mousum-github Apr 7, 2023
11ee8ba
Updated example to show QR performs better
mousum-github Apr 9, 2023
a7c4966
Update src/GLM.jl
mousum-github Apr 9, 2023
db22c76
Update src/linpred.jl
mousum-github Apr 9, 2023
c9b0b65
Updated test cases. Test cases with cholesky and qr methods are writt…
mousum-github Apr 13, 2023
0ad7ab8
Updated test case due to different pivot in different systems
mousum-github Apr 13, 2023
985dc7c
replaced equality check to egality check in glmfit
mousum-github Apr 14, 2023
1fc1086
updated example and added doctest
mousum-github Apr 20, 2023
ba9b1c3
updated example without doctest
mousum-github Apr 20, 2023
ae5fe2a
Avoiding multiple copyies of design matrix
mousum-github Apr 22, 2023
bf4d421
added one more example in qr vs cholesky
mousum-github Apr 24, 2023
135a02c
updated example, remove unused constructor and added one more test case
mousum-github May 2, 2023
3c87fe0
updated example to pass doctest
mousum-github May 2, 2023
f8f7337
updated example to pass doctest
mousum-github May 2, 2023
86d3c10
updated example to pass doctest
mousum-github May 2, 2023
941d54f
updated example to pass doctest
mousum-github May 2, 2023
60e9ec4
example
mousum-github May 2, 2023
8656471
updated example with rdchem data
mousum-github May 3, 2023
7ebdacd
Update docs/src/examples.md
mousum-github May 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,129 @@ julia> round.(vcov(ols); digits=5)
0.38889 -0.16667
-0.16667 0.08333
```
By default, the `lm` method uses the Cholesky factorization which is known as fast but numerically unstable, especially for ill-conditioned design matrices. Also, the Cholesky method aggressively detects multicollinearity. You can use the `method` keyword argument to apply a more stable QR factorization method.

```jldoctest
julia> data = DataFrame(X=[1,2,3], Y=[2,4,7]);
mousum-github marked this conversation as resolved.
Show resolved Hide resolved

julia> ols = lm(@formula(Y ~ X), data; method=:qr)
LinearModel

Y ~ 1 + X

Coefficients:
─────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept) -0.666667 0.62361 -1.07 0.4788 -8.59038 7.25704
X 2.5 0.288675 8.66 0.0732 -1.16797 6.16797
─────────────────────────────────────────────────────────────────────────
```
The following example shows that QR decomposition works better for an ill-conditioned design matrix. The linear model with the QR method is a better model than the linear model with Cholesky decomposition method since the estimated loglikelihood of previous model is higher.
Note that, the condition number of the design matrix is quite high (≈ 3.52e7).

```
julia> X = [-0.4011512997627107 0.6368622664511552;
-0.0808472925693535 0.12835204623364604;
-0.16931095045225217 0.2687956795496601;
-0.4110745650568839 0.6526163576003452;
-0.4035951747670475 0.6407421349445884;
-0.4649907741370211 0.7382129928076485;
-0.15772708898883683 0.25040532268222715;
-0.38144358562952446 0.6055745630707645;
-0.1012787681395544 0.16078875117643368;
-0.2741403589052255 0.4352214984054432];

julia> y = [4.362866166172215,
0.8792840060172619,
1.8414020451091684,
4.470790758717395,
4.3894454833815395,
5.0571760643993455,
1.7154177874916376,
4.148527704012107,
1.1014936742570425,
2.9815131910316097];

julia> modelqr = lm(X, y; method=:qr)
LinearModel

Coefficients:
────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────
x1 5.00389 0.0560164 89.33 <1e-12 4.87472 5.13307
x2 10.0025 0.035284 283.48 <1e-16 9.92109 10.0838
────────────────────────────────────────────────────────────────

julia> modelchol = lm(X, y; method=:cholesky)
LinearModel

Coefficients:
────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────
x1 5.17647 0.0849184 60.96 <1e-11 4.98065 5.37229
x2 10.1112 0.053489 189.03 <1e-15 9.98781 10.2345
────────────────────────────────────────────────────────────────

julia> loglikelihood(modelqr) > loglikelihood(modelchol)
true
```
Since the Cholesky method with `dropcollinear = true` aggressively detects multicollinearity,
if you ever encounter multicollinearity in any GLM model with Cholesky,
it is worth trying the same model with QR decomposition.
The following example is taken from `Introductory Econometrics: A Modern Approach, 7e" by Jeffrey M. Wooldridge`.
The dataset is used to study the relationship between firm size—often measured by annual sales—and spending on
research and development (R&D).
The following shows that for the given model,
the Cholesky method detects multicollinearity in the design matrix with `dropcollinear=true`
and hence does not estimate all parameters as opposed to QR.

```jldoctest
julia> y = [9.42190647125244, 2.084805727005, 3.9376676082611, 2.61976027488708, 4.04761934280395, 2.15384602546691,
2.66240668296813, 4.39475727081298, 5.74520826339721, 3.59616208076477, 1.54265284538269, 2.59368276596069,
1.80476510524749, 1.69270837306976, 3.04201245307922, 2.18389105796813, 2.73844122886657, 2.88134002685546,
2.46666669845581, 3.80616021156311, 5.12149810791015, 6.80378007888793, 3.73669862747192, 1.21332454681396,
2.54629635810852, 5.1612901687622, 1.86798071861267, 1.21465551853179, 6.31019830703735, 1.02669405937194,
2.50623273849487, 1.5936255455017];

julia> x = [4570.2001953125, 2830, 596.799987792968, 133.600006103515, 42, 390, 93.9000015258789, 907.900024414062,
19773, 39709, 2936.5, 2513.80004882812, 1124.80004882812, 921.599975585937, 2432.60009765625, 6754,
1066.30004882812, 3199.89990234375, 150, 509.700012207031, 1452.69995117187, 8995, 1212.30004882812,
906.599975585937, 2592, 201.5, 2617.80004882812, 502.200012207031, 2824, 292.200012207031, 7621, 1631.5];

julia> rdchem = DataFrame(rdintens=y, sales=x);

julia> mdl = lm(@formula(rdintens ~ sales + sales^2), rdchem; method=:cholesky)
LinearModel

rdintens ~ 1 + sales + :(sales ^ 2)

Coefficients:
───────────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────────────────────────────
(Intercept) 0.0 NaN NaN NaN NaN NaN
sales 0.000852509 0.000156784 5.44 <1e-05 0.000532313 0.00117271
sales ^ 2 -1.97385e-8 4.56287e-9 -4.33 0.0002 -2.90571e-8 -1.04199e-8
───────────────────────────────────────────────────────────────────────────────────────

julia> mdl = lm(@formula(rdintens ~ sales + sales^2), rdchem; method=:qr)
LinearModel

rdintens ~ 1 + sales + :(sales ^ 2)

Coefficients:
─────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────────────
(Intercept) 2.61251 0.429442 6.08 <1e-05 1.73421 3.49082
sales 0.000300571 0.000139295 2.16 0.0394 1.56805e-5 0.000585462
sales ^ 2 -6.94594e-9 3.72614e-9 -1.86 0.0725 -1.45667e-8 6.7487e-10
─────────────────────────────────────────────────────────────────────────────────
```


## Probit regression
```jldoctest
Expand Down
10 changes: 10 additions & 0 deletions src/GLM.jl
Original file line number Diff line number Diff line change
Expand Up @@ -91,13 +91,23 @@ module GLM
pivoted_cholesky!(A; kwargs...) = cholesky!(A, RowMaximum(); kwargs...)
end

@static if VERSION < v"1.7.0"
pivoted_qr!(A; kwargs...) = qr!(A, Val(true); kwargs...)
else
pivoted_qr!(A; kwargs...) = qr!(A, ColumnNorm(); kwargs...)
end

const COMMON_FIT_KWARGS_DOCS = """
- `dropcollinear::Bool=true`: Controls whether or not a model matrix
less-than-full rank is accepted.
If `true` (the default) the coefficient for redundant linearly dependent columns is
`0.0` and all associated statistics are set to `NaN`.
Typically from a set of linearly-dependent columns the last ones are identified as redundant
(however, the exact selection of columns identified as redundant is not guaranteed).
- `method::Symbol=:cholesky`: Controls which decomposition method to use.
If `method=:cholesky` (the default), then the `Cholesky` decomposition method will be used.
If `method=:qr`, then the `QR` decomposition method (which is more stable
but slower) will be used.
- `wts::Vector=similar(y,0)`: Prior frequency (a.k.a. case) weights of observations.
Such weights are equivalent to repeating each observation a number of times equal
to its weight. Do note that this interpretation gives equal point estimates but
Expand Down
28 changes: 24 additions & 4 deletions src/glmfit.jl
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,8 @@ function nulldeviance(m::GeneralizedLinearModel)
X = fill(1.0, length(y), hasint ? 1 : 0)
nullm = fit(GeneralizedLinearModel,
X, y, d, r.link; wts=wts, offset=offset,
dropcollinear=isa(m.pp.chol, CholeskyPivoted),
dropcollinear=ispivoted(m.pp),
method=decomposition_method(m.pp),
maxiter=m.maxiter, minstepfac=m.minstepfac,
atol=m.atol, rtol=m.rtol)
dev = deviance(nullm)
Expand Down Expand Up @@ -341,7 +342,8 @@ function nullloglikelihood(m::GeneralizedLinearModel)
X = fill(1.0, length(y), hasint ? 1 : 0)
nullm = fit(GeneralizedLinearModel,
X, y, d, r.link; wts=wts, offset=offset,
dropcollinear=isa(m.pp.chol, CholeskyPivoted),
dropcollinear=ispivoted(m.pp),
method=decomposition_method(m.pp),
maxiter=m.maxiter, minstepfac=m.minstepfac,
atol=m.atol, rtol=m.rtol)
ll = loglikelihood(nullm)
Expand Down Expand Up @@ -559,6 +561,7 @@ function fit(::Type{M},
d::UnivariateDistribution,
l::Link = canonicallink(d);
dropcollinear::Bool = true,
method::Symbol = :cholesky,
dofit::Union{Bool, Nothing} = nothing,
wts::AbstractVector{<:Real} = similar(y, 0),
offset::AbstractVector{<:Real} = similar(y, 0),
Expand All @@ -575,7 +578,15 @@ function fit(::Type{M},
end

rr = GlmResp(y, d, l, offset, wts)
res = M(rr, cholpred(X, dropcollinear), nothing, false)

if method === :cholesky
res = M(rr, cholpred(X, dropcollinear), nothing, false)
elseif method === :qr
res = M(rr, qrpred(X, dropcollinear), nothing, false)
else
throw(ArgumentError("The only supported values for keyword argument `method` are `:cholesky` and `:qr`."))
bkamins marked this conversation as resolved.
Show resolved Hide resolved
end

return dofit ? fit!(res; fitargs...) : res
end

Expand All @@ -594,6 +605,7 @@ function fit(::Type{M},
offset::Union{AbstractVector, Nothing} = nothing,
wts::Union{AbstractVector, Nothing} = nothing,
dropcollinear::Bool = true,
method::Symbol = :cholesky,
dofit::Union{Bool, Nothing} = nothing,
contrasts::AbstractDict{Symbol}=Dict{Symbol,Any}(),
fitargs...) where {M<:AbstractGLM}
Expand All @@ -613,7 +625,15 @@ function fit(::Type{M},
off = offset === nothing ? similar(y, 0) : offset
wts = wts === nothing ? similar(y, 0) : wts
rr = GlmResp(y, d, l, off, wts)
res = M(rr, cholpred(X, dropcollinear), f, false)

if method === :cholesky
res = M(rr, cholpred(X, dropcollinear), f, false)
elseif method === :qr
res = M(rr, qrpred(X, dropcollinear), f, false)
else
throw(ArgumentError("The only supported values for keyword argument `method` are `:cholesky` and `:qr`."))
end

return dofit ? fit!(res; fitargs...) : res
end

Expand Down
Loading