Expanding fweights and pweights #283

jeffwong · 2017-07-15T06:14:48Z

I would like to bring fweights and pweights into the GLM package.

Different types of weights affect the vcov function in the GLM package. Here is a reference on how cov varies

One thing that I would like to add is the ability to combine fweights and pweights in a GLM. I think we can start here in StatsBase to expose the combination of weights. For example, I believe the appropriate varcorrection function for combining fweights and pweights would be

@inline function varcorrection(fw::FrequencyWeights, pw::ProbabilityWeights, corrected::Bool=false)
    n_f = fw.sum    
    if corrected
        n = count(!iszero, pw)
        n / (n_f * (n - 1))
    else
        1 / n_f
    end
end

The intuition is that the fweights tell us how many data points we are able to observe in our sample. The pweights tell us a relative weighting within the sample depending on the probability that a particular data sampled was sampled. The varcorrection would replace the "s" with the s from the fweights, as that is the component which is telling us how large the sample is.

This feels like it is just ProbabilityWeights with sum = fw.sum, and it doesn't need a separate varcorrection function. I wonder if it is clearer though to write it like this, to show that fweights and pweights are being combined?

In GLM, I plan to write a similar function that combines fweights and pweights into a single weight vector that is used for maximum likelihood

ararslan · 2017-07-15T19:09:43Z

I'll preface this by saying that I quite rarely have had to work with weights, so take my opinions with a grain of salt... I believe @nalimilan and @rofinn are better versed in weights.

In GLM, I plan to write a similar function that combines fweights and pweights into a single weight vector that is used for maximum likelihood

👍 though it sounds like that function should probably live in this package instead since it could be more generally applicable to other packages in addition to GLM.

The function that you've shown here seems reasonable I think, though we'll need to do some 0 checking. (For example, it could be the case that the probability weights are [0,0,1,0] or whatever, in which case we'd get a division error in the corrected case.)

nalimilan · 2022-09-03T19:04:37Z

Closing in favor of JuliaStats/GLM.jl#186.

ararslan mentioned this issue Jul 15, 2017

Path towards GLMs with fweights, pweights, and aweights JuliaStats/GLM.jl#186

Open

nalimilan closed this as completed Sep 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expanding fweights and pweights #283

Expanding fweights and pweights #283

jeffwong commented Jul 15, 2017 •

edited

Loading

ararslan commented Jul 15, 2017

nalimilan commented Sep 3, 2022

Expanding fweights and pweights #283

Expanding fweights and pweights #283

Comments

jeffwong commented Jul 15, 2017 • edited Loading

ararslan commented Jul 15, 2017

nalimilan commented Sep 3, 2022

jeffwong commented Jul 15, 2017 •

edited

Loading