-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sum results of multiple ToyCalculator.distributions() #829
Comments
I'm not fully understanding. What do you mean by "summing"? They're the individual qmu for the signal-like versus background-like distributions. I guess your fundamental question is, can we parallelize toys? We have an open issue here in #807 describing this. The idea would be to allow for a backend to do the parallelization (e.g. multi-core). This should generally help when you need 1m+ toys. |
I guess @lawrenceleejr just wants to distribute each toy to a separate machine and
then combine the results..
I think Dask might be a nice way to do it in lieu of a batch system based approach.
Larry, do you use the jax backend with jit? This speeds up toy calculation
significantly. (it's still true that ROOT for low number of bins ~n_bins=1 is faster and pyhf only kicks in for more complex models)
Cheers,
Lukas
|
Thanks both -- Yes indeed I'm interested in not just parallelizing but being able to persistify the results such that jobs can be distributed across machines and time. So yes parallelizing but not just a matter of starting new threads. But I can definitely try out the Jax backend -- I'm currently using the numpy backend. |
I can help set this up w/ jax if you send a demo script, I can adapt it
quickly. it's basically undocumented :)
…On Tue, Apr 21, 2020 at 3:27 PM Lawrence Lee ***@***.***> wrote:
Thanks both -- Yes indeed I'm interested in not just parallelizing but
being able to persistify the results such that jobs can be distributed
across machines and time. So yes parallelizing but not just a matter of
starting new threads.
But I can definitely try out the Jax backend -- I'm currently using the
numpy backend.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#829 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARV6AZWD7WN6SOGBTIO7WTRNWNKZANCNFSM4MNCPJRA>
.
|
Ah thanks for the offer. You can find my super simple script I've been playing around with in this gist: https://gist.github.com/lawrenceleejr/51244c639bcb400dbe56ab986456aa30 |
for these simple models jax doesn't seem to buy you anythnig.. but I put togehter a small scrpit that shows how to save out the sample data and re-merging it together https://gist.github.com/lukasheinrich/87ecc8a1fd3181befd357008859da28f this bring you back to the result of |
PS: once you have high-stats distribution objects we can work no getting you a cls result . |
Oh that's beautiful thanks! Ok I see -- I was hoping it would basically be something like this. Glad that the objects can just be concatenated. I'll start setting something up for a batch system. Thanks all! |
Cool to hear. Sorry for the late follow up on my part, but glad to hear that you're using this pre-release feature. Please keep us updated on how things are going. |
No worries @matthewfeickert! Actually while we're here, is it possible for someone to outline how to go from these distributions to observed and expected(+variations) CLs values? I see how to get the observed I think -- but not clear to me how to get the rest of it using this calculator. Is there an example lives somewhere? Thanks! |
@lawrenceleejr it's basically this function https://github.com/scikit-hep/pyhf/blob/toycalc/src/pyhf/infer/utils.py#L147 but we can make the API a bit modularized so you can come with pre-made distributions |
I see -- so we can just grab those lines in our higher level scripts for now. If you're interested we have a few people who are interested in learning these tools who could maybe help prep a PR for modularizing things. No promised, but are you open to that or would you rather keep that kind of thing in house? Thanks so much -- this is all super helpful! And in fact given all that info, it'd be fine for me if you want to close the issue. |
we're always open for external PRs. in fact we're very happy to see them. I'll close this issue then, feel free to reopen. one easy refactoring is to have a the bottom part be its own function function in
|
Question
Hey guys! I've been using the new branch
toycalc
for some tests of toy throwing with an analysis with small yields. With these analyses, I often find that we sometimes need order millions of toys thrown to get a reasonable result, which turns out to be a real pain. We've found HistFactory to crash with that many toys, so we're this out with pyHF. So far, we've found that it's much more able to handle these huge numbers of toys.The issue comes with the fact that the fastest machine I could find still can only throw toys at ~100/s peak, so these jobs a la the simple example here (*) take a really long time still.
ToyCalculator.distributions()
returns two distribution objects, and I'm wondering if it's possible to calculate these distributions separately in separate jobs and then join them later before calculating a p-value. I couldn't find any way of summing these, but maybe I missed something obvious.Figuring out a way to do this would really open up some possibilities for us to blast a huge number of toys to a cluster and would be super useful.
-L
(*) https://github.com/scikit-hep/pyhf/blob/toycalc/src/pyhf/infer/calculators.py#L387
Relevant Issues and Pull Requests
#790
The text was updated successfully, but these errors were encountered: