-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional documentation needed on difference between calculator type test statistic return type #1792
Comments
I don't understand how the first two results are the exact same, but if this is a toy vs non-toy issue with a 4 sigma effect (which, looking at the setup, would make sense roughly), are you sure you have enough toys to evaluate this (several 10k)? |
Here I'm using the calculator API in a strange way as only 1 experiment is being evaluated, so there really isn't any pseudoexperiment generation happening. I should give a better example later. pyhf/src/pyhf/infer/calculators.py Lines 951 to 960 in 9fd99be
|
I missed the sqrt required to go from q0 to the significance in the previous comment, this should be a 2 sigma effect of course: sqrt(2500) = 50, so a 100 event excess over background-only with negligible background uncertainty will be a 2 sigma effect. That agrees perfectly with the first two numbers. |
The last two numbers scale with the number of signal events, so these are something else. |
yeah. The fact that the calculators are going in opposite directions as the signal increases is telling there's a problem. |
At the moment this doesn't run any toys as far as I can tell, so I'm assuming this is a question about the calculators, and not about asymptotic vs toy agreement in general? |
Yes, I didn't phrase the original text clearly as I was dumping things in for myself to clarify later. |
Okay @kratsg has pointed out that the behavior of the calculator APIs is (known to be) not consistent across the asymptotic and toy based as asymptotic is returning p-values (so cumulative distributions of test statistics) pyhf/src/pyhf/infer/__init__.py Lines 169 to 178 in 9fd99be
while the toy based is returning q test stats. At the very least this needs to be made a lot more clear in the docs. |
well, it's consistent here, it's that the |
While all very true, a user should rightly complain that the public API is too confusing as is (given the current documentation). |
My main complaint at the moment is that while we make it clear that the asymptotic test stats are in pyhf/src/pyhf/infer/calculators.py Lines 85 to 89 in 9fd99be
we don't make this clear again in the calculator pyhf/src/pyhf/infer/calculators.py Lines 330 to 332 in 9fd99be
or in the toy calculator (that it is different) pyhf/src/pyhf/infer/calculators.py Lines 922 to 924 in 9fd99be
Also we could mention in pyhf/src/pyhf/infer/calculators.py Lines 526 to 532 in 9fd99be
So this really is a documentation issue, but a pretty big one in my mind. I forgot this, and if I've written some of this code and can make this mistake when tired then I think a user definitely will. |
this part I think is what confuses me, so if you have a better grasp, it would be great to clarify this for the upcoming release. |
This all came up as I was trying to take a stab at Issue #1712 and was trying to figure out to have things work for either calculator type. |
Summary
There is a large discrepancy in the value of the discovery test statistic if it is generated via an Asymptotic based or toy based calculator.
Related Issues:
OS / Environment
Steps to Reproduce
File Upload (optional)
No response
Expected Results
The discovery test statistic would be the same regardless of calculator type.
Actual Results
pyhf Version
9fd99be886349a90e927672e950cc233fad0916c on master
Code of Conduct
The text was updated successfully, but these errors were encountered: