Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform and Non-Uniform Time Axes #237

Open
kkappler opened this issue Dec 26, 2024 · 0 comments
Open

Uniform and Non-Uniform Time Axes #237

kkappler opened this issue Dec 26, 2024 · 0 comments

Comments

@kkappler
Copy link
Collaborator

Recently there have been a number of small updates to mth5 about timing issues (refs can include Nov/Dec 2024 PR DQ, and Autumn 2023 KK) making sample rate a settable property.

These updates and some others have been associated with both performance and accuracy. Performance issues have been due to generating the time axis and estimating the sample rate from the timestamps in the time axis. Time axes are basically vectors of timestamps that get bound to data in an xarray or dataframe. In an old version of the code, the sample_rate was always computed on the fly by taking the median difference of the time stamps. This turned out to be impractically expensive for high sample rate data, because the sample rate is referred to many times during metadata validation and was being recomputed at each call. Two solutions were implemented: Solution 1 was to make an _sample_rate a property which could be set to None, and if requested, call _compute_sample_rate and store the value. Then future calls to sample_rate would return self._sample_rate once it was not None, skipping the repeated computations. Solution 2 involved baking in the assumption that the data are uniformly sampled and that the time axis is perfectly accurate. This I think happens somewhere in the FC data, and perhaps elsewhere. In this case, the sample rate returned is the difference in the timestamps at position 1, and 0.

If all data were uniformly sampled and all timestamps were perfectly precise, this would be fine, but some recent updates have been mostly associated with data that are not sampled with a integer-nanosecond-time-step. For example 3Hz, 30Hz, 24000Hz, etc. (TODO: confirm this is when the idealized sample rate (if integer), has a prime factorization containing a prime number besides 2 or 5 (such as 3, 7, 11, ...) , or the sample rate is only expressible as a floating point with an infinite number of characters after the decimal).

Why are we having this trouble:

  • If the sample_interval cannot be expressed as an integer number of nanoseconds, there will always be non-uniformity in the time axis as long as we use nano-second limited pd.TimeStamp for our time-axis values (and use the time axis as the source of sample rate).
  • Another way to say this is that the sample rate needs more information than just an integer number of nanoseconds to be totally characterized

Consequences of non-integer-nanosecond sampled data include hard to trace glitches when merging data on the time axis between runs, as well as the fact that the sample-rate property of a RunTS or ChannelTS is not a true and complete characterization of the time axis.

Proposed Solution 1:

  • Create an abstract base class called TimeAxis.
  • Two children of this class are UniformTimeAxis and NonUniformTimeAxis
  • This does not solve the problem completely, because data sampled at 3Hz are uniformly sampled, but they cannot be represented by a uniform delta-t if we are using ns resolution time-stamps, thus for practical purposes they are non-uniform.
  • Thus, in the context of pd.TimeStamp as the container for time-stamps, they are non-uniform.
  • By adding an attribute called say minimum_resolution_of_time_stamp = 1e-9 then a quick test can be done to tell if if a given idealized sample will result in a uniform or non-uniform time axis.
  • The TimeAxis class can have the following methods:
  • resolution: This property tells how fine the sampling can be tracked
  • start_time (or start_sample), and/or end_time (or end_sample),
  • idealized_sample_rate (floating point resolution)
  • idealized_sample_interval
  • to_array or to_axis, which forms the actual vector of values that goes into the numpy array.
  • The main idea is that all logic to handle the usual attributes sample rate and sample interval that get called can live in these methods, and we can quickly identify by inspecting an object if it is uniform or non-uniform.
  • Probably users would be strongly encouraged to resample data before archiving (or at least before processing) to a UniformTimeAxis.
  • sample_rate, sample_intervaland then this could be returned at a selected resolfor amy but it pushes the instances into two cases, Uniform and NonUniform. Uniform, in this case can be used for all time series that
  • The TimeAxis classes would become mt_metadata objects and would be embedded in RunTS, ChannelTS, Spectrogram or any other time-series-like data container.

Proposed Solution 2:

  • From rambling thorough Solution 1, it seems that the main issue is the base resolution. If we switch to the attotime package for timestamp handling we get yoctosecond resolution and these issues will possibly go away (at least for MT).
  • That said, it is still deireable to support non-uniform sampled data, as this is the most general case, and any time series can be represented as a zipped pair of vectors, time and data.

Related to MTH5 Issue 225 Stress Tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant