You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data chunking is currently done semi-arbitrarily at two levels, the "run" and the "chunk". The concept of a run is not always relevant (e.g. MC simulations) and the potential time-dependence of config options creates complexity that resulted in a workaround "correction management tool" which bridges the gap between the config being set globally and the need to define evolving time dependent config (many variables are only known a week or so in advance of data being processed) options without affecting hashes of unchanged time periods.
Proposed solution
Config options should be able to be defined with a validity interval that also dictates the data chunking, so when downstream plugins request already processed data for a given interval the hash is only affected by the values in the interval being considered. A chunk should be a piece of data from a interval over which the config does not change. The run_id should not have unique status, it is just a proxy for the interval over which the DAQ config is constant, since the daq config is not necessarily hashed by strax. It should therefore be included as just another config parameter with validity intervals. It is also useful as an index uniquely mapping intervals to strings but there is not reason not to allow for indexing over any other config parameters with intervals of validity defined. An implementation of this along with git-style versioning is almost ready and can be integrated into strax very soon if people approve. A simple visualization of such chunking can be seen here:
As can be seen above, some parameters may be constant, others have validity intervals. When a selection is made over a given range, the config is chunked into the longest intervals possible for which all the requested parameters are constant (constant "config", the actual object may be e.g. an array defining more fine grained time-dependence used by the plugin) this is implemented into the igit package i am working on that also does versioning and visualization for these kind of structures.
The text was updated successfully, but these errors were encountered:
What is the problem?
Data chunking is currently done semi-arbitrarily at two levels, the "run" and the "chunk". The concept of a run is not always relevant (e.g. MC simulations) and the potential time-dependence of config options creates complexity that resulted in a workaround "correction management tool" which bridges the gap between the config being set globally and the need to define evolving time dependent config (many variables are only known a week or so in advance of data being processed) options without affecting hashes of unchanged time periods.
Proposed solution
Config options should be able to be defined with a validity interval that also dictates the data chunking, so when downstream plugins request already processed data for a given interval the hash is only affected by the values in the interval being considered. A chunk should be a piece of data from a interval over which the config does not change. The run_id should not have unique status, it is just a proxy for the interval over which the DAQ config is constant, since the daq config is not necessarily hashed by strax. It should therefore be included as just another config parameter with validity intervals. It is also useful as an index uniquely mapping intervals to strings but there is not reason not to allow for indexing over any other config parameters with intervals of validity defined. An implementation of this along with git-style versioning is almost ready and can be integrated into strax very soon if people approve. A simple visualization of such chunking can be seen here:
As can be seen above, some parameters may be constant, others have validity intervals. When a selection is made over a given range, the config is chunked into the longest intervals possible for which all the requested parameters are constant (constant "config", the actual object may be e.g. an array defining more fine grained time-dependence used by the plugin) this is implemented into the igit package i am working on that also does versioning and visualization for these kind of structures.
The text was updated successfully, but these errors were encountered: