Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram with WeightedMean storage returns wrong sum_of_weights_squared #924

Open
olbessid opened this issue Apr 22, 2024 · 4 comments
Open

Comments

@olbessid
Copy link

olbessid commented Apr 22, 2024

I want to create histograms and be able to access their sum of weights squared. When using WeightedMean storage sum_of_weights_squared just returns the number of entries, not the sum of weights squared. The same issue is true for sum_of_weights (it returns the counts instead again), but this is a smaller issue for me.

I could in principle retrieve the correct sum of weights squared if I used accumulators instead of histograms. However, for the purpose of my data analysis, this would slow down the code a lot and I would need to replicate the large nested structure of the histograms into accumulators. So I would much prefer to just use histograms, if this bug can be fixed.

To test:

import boost_histogram as bh
h = bh.Histogram(bh.axis.Regular(1, 0, 2), storage=bh.storage.WeightedMean())  # Double() is the default
h.fill([1]*3, sample=[2]*3)
h.view().sum_of_weights_squared

The last line returns

array([3.])

while the sum of weights squared is actually 12.

I am using python 3.8.
Attaching a screenshot of my notebook.
notebook_weightssquared

@HDembinski
Copy link
Member

That's odd. @henryiii ?

@henryiii
Copy link
Member

henryiii commented Jun 3, 2024

Using https://pyodide.org/en/stable/console.html because it's handy:

Screenshot 2024-06-03 at 4 46 00 PM

(Edit: chopped off the answer by mistake)

@matthewfeickert
Copy link
Member

matthewfeickert commented Sep 13, 2024

Adding the copy-pasteable code from Henry's answer, the weight argument was missing from the fill command:

import boost_histogram as bh
h = bh.Histogram(bh.axis.Regular(1, 0, 2), storage=bh.storage.WeightedMean())
h.fill([1]*3, weight=2, sample=[2]*3)  # note use of weight here
# Histogram(Regular(1, 0, 2), storage=WeightedMean()) # Sum: WeightedMean(sum_of_weights=6, sum_of_weights_squared=12, value=2, variance=0)
h.view().sum_of_weights_squared
# array([12.])

@matthewfeickert
Copy link
Member

@olbessid if this is clear can the issue get closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants