-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add keyword arg to modelmatrix; define momentmatrix #16
base: main
Are you sure you want to change the base?
Changes from 14 commits
e04ad3f
6f6a160
0625c3c
e514db0
bfe8ac6
af65888
11505ff
de0c6ae
c36352b
14ccc70
9a7b2ab
48333e5
e927f72
ab928fa
2858ba0
06206ad
7fa73cb
c3460ae
93f8742
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -34,19 +34,25 @@ Return the mean of the response. | |||||||||||||||
function meanresponse end | ||||||||||||||||
|
||||||||||||||||
""" | ||||||||||||||||
modelmatrix(model::RegressionModel) | ||||||||||||||||
modelmatrix(model::RegressionModel; weighted::Bool=false) | ||||||||||||||||
|
||||||||||||||||
Return the model matrix (a.k.a. the design matrix). | ||||||||||||||||
Return the model matrix (design matrix) or, if `weighted=true` the weighted | ||||||||||||||||
model matrix, i.e. `X * sqrt.(W)`, where `X` is the model matrix and | ||||||||||||||||
`W` is the diagonal matrix whose elements are the [model weights](@ref weights(::StatisticalModel)). | ||||||||||||||||
""" | ||||||||||||||||
function modelmatrix end | ||||||||||||||||
|
||||||||||||||||
""" | ||||||||||||||||
crossmodelmatrix(model::RegressionModel) | ||||||||||||||||
crossmodelmatrix(model::RegressionModel; weighted::Bool=false) | ||||||||||||||||
|
||||||||||||||||
Return `X'X` where `X` is the model matrix of `model`. | ||||||||||||||||
Return `X'X` where `X` is the model/design matrix of `model` or, if `weighted=true`, `X'WX`, | ||||||||||||||||
where `W` is the diagonal matrix whose elements are the [model weights](@ref weights(::StatisticalModel)). | ||||||||||||||||
This function will return a pre-computed matrix stored in `model` if possible. | ||||||||||||||||
""" | ||||||||||||||||
crossmodelmatrix(model::RegressionModel) = (x = modelmatrix(model); Symmetric(x' * x)) | ||||||||||||||||
function crossmodelmatrix(model::RegressionModel; weighted::Bool=false) | ||||||||||||||||
x = weighted ? modelmatrix(model; weighted=weighted) : modelmatrix(model) | ||||||||||||||||
return Symmetric(x' * x) | ||||||||||||||||
end | ||||||||||||||||
|
||||||||||||||||
""" | ||||||||||||||||
leverage(model::RegressionModel) | ||||||||||||||||
|
@@ -65,11 +71,13 @@ of each data point. | |||||||||||||||
function cooksdistance end | ||||||||||||||||
|
||||||||||||||||
""" | ||||||||||||||||
residuals(model::RegressionModel) | ||||||||||||||||
residuals(model::RegressionModel; weighted::Bool=false) | ||||||||||||||||
|
||||||||||||||||
Return the residuals of the model or, if `weighted=true`, the residuals multiplied by | ||||||||||||||||
the square root of the [model weights](@ref weights(::StatisticalModel)). | ||||||||||||||||
Comment on lines
+76
to
+77
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
|
||||||||||||||||
Return the residuals of the model. | ||||||||||||||||
""" | ||||||||||||||||
function residuals end | ||||||||||||||||
function residuals(model::RegressionModel; weighted::Bool=false) end | ||||||||||||||||
|
||||||||||||||||
""" | ||||||||||||||||
predict(model::RegressionModel, [newX]) | ||||||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -314,3 +314,24 @@ function adjr2(model::StatisticalModel, variant::Symbol) | |||||
end | ||||||
|
||||||
const adjr² = adjr2 | ||||||
|
||||||
""" | ||||||
momentmatrix(model::StatisticalModel) | ||||||
nalimilan marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Return the matrix containing the empirical moments defining the estimated parameters. | ||||||
|
||||||
For likelihood-based models, `momentmatrix` returns the log-score, i.e. the gradient | ||||||
of the log-likelihood evaluated at each observation. For semiparametric models, each row | ||||||
of the ``(n \\times k)`` matrix returned by `momentmatrix` is ``m(Z_i, b)``, where `m` | ||||||
is a vector-valued function evaluated at the estimated parameter `b` and ``Z_i`` is the | ||||||
vector of data for entity `i`. The vector-valued function satisfies ``\\sum_{i=1}^n m(Z_i, b) = 0``. | ||||||
|
||||||
For linear and generalized linear models, the parameters of interest are the coefficients | ||||||
of the linear predictor. The moment matrix of a linear model is given by `u.*X`, | ||||||
where `u` is the vector of residuals and `X` is the model matrix. The moment matrix of | ||||||
a a generalized linear model with link function `g` is `X'e`, where `e` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
is given by ``Y-g^{-1}(X'b)`` where `X` is the model matrix, `Y` is the model response, and | ||||||
`b` is the vector of estimated coefficients. | ||||||
""" | ||||||
function momentmatrix end | ||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -6,13 +6,31 @@ using StatsAPI: RegressionModel, crossmodelmatrix | |||||||||||||||||||||||||||||||||||
struct MyRegressionModel <: RegressionModel | ||||||||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
struct MyWeightedRegressionModel <: RegressionModel | ||||||||||||||||||||||||||||||||||||
wts::AbstractVector | ||||||||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
StatsAPI.modelmatrix(::MyRegressionModel) = [1 2; 3 4] | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
function StatsAPI.modelmatrix(r::MyWeightedRegressionModel; weighted::Bool=false) | ||||||||||||||||||||||||||||||||||||
X = [1 2; 3 4] | ||||||||||||||||||||||||||||||||||||
weighted ? sqrt.(r.wts).*X : X | ||||||||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
w = [0.3, 0.2] | ||||||||||||||||||||||||||||||||||||
Comment on lines
+9
to
+20
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that the methods hardcode the matrix, probably not worth having a separate type which doesn't hardcode weights:
Suggested change
... and simplify tests below. |
||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
@testset "TestRegressionModel" begin | ||||||||||||||||||||||||||||||||||||
m = MyRegressionModel() | ||||||||||||||||||||||||||||||||||||
r = MyWeightedRegressionModel(w) | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
@test crossmodelmatrix(m) == [10 14; 14 20] | ||||||||||||||||||||||||||||||||||||
@test crossmodelmatrix(m; weighted=false) == [10 14; 14 20] | ||||||||||||||||||||||||||||||||||||
@test crossmodelmatrix(m) isa Symmetric | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
@test crossmodelmatrix(r) == [10 14; 14 20] | ||||||||||||||||||||||||||||||||||||
@test crossmodelmatrix(r; weighted=false) == [10 14; 14 20] | ||||||||||||||||||||||||||||||||||||
@test crossmodelmatrix(r; weighted=true) ≈ [2.1 3.0; 3.0 4.4] | ||||||||||||||||||||||||||||||||||||
@test crossmodelmatrix(r; weighted=true) isa Symmetric | ||||||||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
end # module TestRegressionModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the square root come from exactly? Doesn't that assume a particular definition of residuals (i.e. using L2-norm rather than e.g. L1-norm)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is tricky. Also
modelmatrix
multiplies the entries ofX
by the square-root of the weights. Why?Think about the linear model. With weights, the crossmodel matrix is$X'WX$ . Then, to obtain it we can now do
modelmatrix(lm1; weighted = true)'modelmatrix(lm1; weighted = true)
.Notice that this is consistent with
R
; see, e.g., the function weighted.residuals which is in stats.With weights, any weights is
$$\sqrt{w_i}y_i = \sqrt{w_i}x_i \beta + \sqrt{w_i}u_i.$$ So the understanding is that weighting single constituents of the model (y,x,u) amount to weight by $\sqrt{w_i}$ .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, another tricky point. My understanding is that for residuals, the square root comes from the fact that deviance residuals themselves are defined as the square root of quantities which are partitions of the deviance. Right?
Note the R docstring for
weighted.residuals
says "Weighted residuals are based on the deviance residuals", which are only one kind of residual. Actually in Rresiduals
also returns weighted residuals, except for response residuals, which are always unweighted. Maybe to be completely accurate we could say "for deviance and Pearson residuals...", so that packages are free to use different definitions (or throw an error) if needed?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @nalimilan says is that the assumption here (and in your change of
modelmatrix
) is that for all kinds of weights the weighted model matrix isX * sqrt.(W)
. Is it always true forFrequencyWeights
,AnalyticWeights
andProbabilityWeights
? x-ref: JuliaStats/GLM.jl#487There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkamins weighted residuals, weighted model matrix do not exist in statistics. They are only useful from a coding point of view - they make it easier to write neater code.
I have always defined these quantities as multiplied by$\sqrt{w_i}$ as it is much more convenient. Some thing for R — which returns silently squared-root weighted residuals. Also other packages, notable FixedEffectModels.jl does that.
@nalimilan make sense what you propose - I will add more context to the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if these don't exist in statistics, the question can be phrased as "are there situations where the returned value is useful, even when you don't know the kind of weights used". I think the answer is yes, but it's tricky, so... R base only supports analytic weights so it's not a great reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's just adapt the docstring then. Feel free to add more context if you have ideas.