You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to bring fweights and pweights into the GLM package.
Different types of weights affect the vcov function in the GLM package. Here is a reference on how cov varies
One thing that I would like to add is the ability to combine fweights and pweights in a GLM. I think we can start here in StatsBase to expose the combination of weights. For example, I believe the appropriate varcorrection function for combining fweights and pweights would be
@inlinefunctionvarcorrection(fw::FrequencyWeights, pw::ProbabilityWeights, corrected::Bool=false)
n_f = fw.sum
if corrected
n =count(!iszero, pw)
n / (n_f * (n -1))
else1/ n_f
endend
The intuition is that the fweights tell us how many data points we are able to observe in our sample. The pweights tell us a relative weighting within the sample depending on the probability that a particular data sampled was sampled. The varcorrection would replace the "s" with the s from the fweights, as that is the component which is telling us how large the sample is.
This feels like it is just ProbabilityWeights with sum = fw.sum, and it doesn't need a separate varcorrection function. I wonder if it is clearer though to write it like this, to show that fweights and pweights are being combined?
In GLM, I plan to write a similar function that combines fweights and pweights into a single weight vector that is used for maximum likelihood
The text was updated successfully, but these errors were encountered:
I'll preface this by saying that I quite rarely have had to work with weights, so take my opinions with a grain of salt... I believe @nalimilan and @rofinn are better versed in weights.
In GLM, I plan to write a similar function that combines fweights and pweights into a single weight vector that is used for maximum likelihood
👍 though it sounds like that function should probably live in this package instead since it could be more generally applicable to other packages in addition to GLM.
The function that you've shown here seems reasonable I think, though we'll need to do some 0 checking. (For example, it could be the case that the probability weights are [0,0,1,0] or whatever, in which case we'd get a division error in the corrected case.)
I would like to bring fweights and pweights into the GLM package.
Different types of weights affect the vcov function in the GLM package. Here is a reference on how cov varies
One thing that I would like to add is the ability to combine fweights and pweights in a GLM. I think we can start here in StatsBase to expose the combination of weights. For example, I believe the appropriate varcorrection function for combining fweights and pweights would be
The intuition is that the fweights tell us how many data points we are able to observe in our sample. The pweights tell us a relative weighting within the sample depending on the probability that a particular data sampled was sampled. The varcorrection would replace the "s" with the s from the fweights, as that is the component which is telling us how large the sample is.
This feels like it is just ProbabilityWeights with sum = fw.sum, and it doesn't need a separate varcorrection function. I wonder if it is clearer though to write it like this, to show that fweights and pweights are being combined?
In GLM, I plan to write a similar function that combines fweights and pweights into a single weight vector that is used for maximum likelihood
The text was updated successfully, but these errors were encountered: