Skip to content
This repository has been archived by the owner on Jun 5, 2024. It is now read-only.

Significant fraction of time is just spent in loading text objects #35

Open
ramirezdiego opened this issue Jul 8, 2021 · 0 comments
Open

Comments

@ramirezdiego
Copy link
Collaborator

Just documenting something that is well-known since the early development stage, but is worth reporting for future upgrades: epix is fast and clean, but still uproot takes a long while loading the Geant4 branches that are stored in text format (type of interaction, energy depositing process, etc.). This was discussed with the uproot and awkward developer Jim Pivarski: https://gitter.im/Scikit-HEP/uproot?at=5fad4a7bc10273610aff87d1

And attaching a very recent output from @HenningSE for 100k, in which the file loading time is about 40% of the total:

Total entries in input file = 500000
Starting to read from output file entry 0
Ending read in at output file entry 100000
It took 182.5625 sec to load data.
Finding clusters of interactions with a dr = 0.05 mm and dt = 10.0 ns
It took 293.2970 sec to find clusters.
It took 6.8375 sec to merge clusters.
Removing clusters not in volumes: TPC BelowCathode
Number of clusters before: 127425
Number of clusters after: 126940
Assigning electric field to clusters
Generating photons and electrons for events
It took 5.7376 sec to get quanta.
Clean event separation
Min. S2 amp BEFORE MACRO-CLUSTERING: 1
Min. S2 amp AFTER MACRO-CLUSTERING: 1
Source finished!

An idea floating around at some point was to load data in uproot chunks and aim at some parallelization, although never explored in detail. We might want to reconsider if running into memory issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant