Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk data on intervals of constant config #432

Open
jmosbacher opened this issue Apr 27, 2021 · 0 comments
Open

Chunk data on intervals of constant config #432

jmosbacher opened this issue Apr 27, 2021 · 0 comments

Comments

@jmosbacher
Copy link
Contributor

jmosbacher commented Apr 27, 2021

What is the problem?

Data chunking is currently done semi-arbitrarily at two levels, the "run" and the "chunk". The concept of a run is not always relevant (e.g. MC simulations) and the potential time-dependence of config options creates complexity that resulted in a workaround "correction management tool" which bridges the gap between the config being set globally and the need to define evolving time dependent config (many variables are only known a week or so in advance of data being processed) options without affecting hashes of unchanged time periods.

Proposed solution

Config options should be able to be defined with a validity interval that also dictates the data chunking, so when downstream plugins request already processed data for a given interval the hash is only affected by the values in the interval being considered. A chunk should be a piece of data from a interval over which the config does not change. The run_id should not have unique status, it is just a proxy for the interval over which the DAQ config is constant, since the daq config is not necessarily hashed by strax. It should therefore be included as just another config parameter with validity intervals. It is also useful as an index uniquely mapping intervals to strings but there is not reason not to allow for indexing over any other config parameters with intervals of validity defined. An implementation of this along with git-style versioning is almost ready and can be integrated into strax very soon if people approve. A simple visualization of such chunking can be seen here:
Screenshot_2021-04-27_22-52-10

As can be seen above, some parameters may be constant, others have validity intervals. When a selection is made over a given range, the config is chunked into the longest intervals possible for which all the requested parameters are constant (constant "config", the actual object may be e.g. an array defining more fine grained time-dependence used by the plugin) this is implemented into the igit package i am working on that also does versioning and visualization for these kind of structures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant