-
Dear pyhf developers, I have been starting to use pyhf on some HistFitter workspaces (~100) which I am converting to JSON. They each have 35 regions / bins and ~150 systematics with up and down variations. Converting the xml files to json takes roughly 30 hours. Is there any way we could speed this up? Looking at the source code it seems as if each file containing a histogram were opened one-by-one. Do you think there'd be any way to aggregate the files and histogram names first and then to read them out in one go? While this might require a slight refactoring of the readxml, it might increase the speed substantially. Looking forward to you oppinion and potentially further speed-up suggestions, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 17 replies
-
Hi! is the 30 hours a single workspace of all 100 workspaces? We du use a cache |
Beta Was this translation helpful? Give feedback.
-
Hi again, To compare the performance, I wrote a little script that compares the performance of python quickAccessTest.py
...
It took 0 seconds to parse with pyRoot
It took 8 seconds to parse with upRoot I have also attached the profiling of these two. I am not sure whether the comparison of this to Any further help would be highly appreciated! Cheers, |
Beta Was this translation helpful? Give feedback.
-
Ok, the problem is partially in In |
Beta Was this translation helpful? Give feedback.
Ok, the problem is partially in
uproot
and inpyhf
. Inuproot
code (thanks @jpivarski !) there is adamerau_levenshtein
function being called when we hit a missing key that takes a long time because the number of keys in this file is very large (https://github.com/scikit-hep/uproot4/blob/85f219a36e76dffc18da4756227a7beb760657a0/src/uproot/_util.py#L810-L858).In
pyhf
, when we hit the name of a histogram that is not retrievable without trying the full path first - then it causes a (slow)DeserializationError
which is caught by an expected exception inpyhf
. We need to change the way we check if a key exists in the file. This is a bug.