You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently there have been a number of small updates to mth5 about timing issues (refs can include Nov/Dec 2024 PR DQ, and Autumn 2023 KK) making sample rate a settable property.
These updates and some others have been associated with both performance and accuracy. Performance issues have been due to generating the time axis and estimating the sample rate from the timestamps in the time axis. Time axes are basically vectors of timestamps that get bound to data in an xarray or dataframe. In an old version of the code, the sample_rate was always computed on the fly by taking the median difference of the time stamps. This turned out to be impractically expensive for high sample rate data, because the sample rate is referred to many times during metadata validation and was being recomputed at each call. Two solutions were implemented: Solution 1 was to make an _sample_rate a property which could be set to None, and if requested, call _compute_sample_rate and store the value. Then future calls to sample_rate would return self._sample_rate once it was not None, skipping the repeated computations. Solution 2 involved baking in the assumption that the data are uniformly sampled and that the time axis is perfectly accurate. This I think happens somewhere in the FC data, and perhaps elsewhere. In this case, the sample rate returned is the difference in the timestamps at position 1, and 0.
If all data were uniformly sampled and all timestamps were perfectly precise, this would be fine, but some recent updates have been mostly associated with data that are not sampled with a integer-nanosecond-time-step. For example 3Hz, 30Hz, 24000Hz, etc. (TODO: confirm this is when the idealized sample rate (if integer), has a prime factorization containing a prime number besides 2 or 5 (such as 3, 7, 11, ...) , or the sample rate is only expressible as a floating point with an infinite number of characters after the decimal).
Why are we having this trouble:
If the sample_interval cannot be expressed as an integer number of nanoseconds, there will always be non-uniformity in the time axis as long as we use nano-second limited pd.TimeStamp for our time-axis values (and use the time axis as the source of sample rate).
Another way to say this is that the sample rate needs more information than just an integer number of nanoseconds to be totally characterized
Consequences of non-integer-nanosecond sampled data include hard to trace glitches when merging data on the time axis between runs, as well as the fact that the sample-rate property of a RunTS or ChannelTS is not a true and complete characterization of the time axis.
Proposed Solution 1:
Create an abstract base class called TimeAxis.
Two children of this class are UniformTimeAxis and NonUniformTimeAxis
This does not solve the problem completely, because data sampled at 3Hz are uniformly sampled, but they cannot be represented by a uniform delta-t if we are using ns resolution time-stamps, thus for practical purposes they are non-uniform.
Thus, in the context of pd.TimeStamp as the container for time-stamps, they are non-uniform.
By adding an attribute called say minimum_resolution_of_time_stamp = 1e-9 then a quick test can be done to tell if if a given idealized sample will result in a uniform or non-uniform time axis.
The TimeAxis class can have the following methods:
resolution: This property tells how fine the sampling can be tracked
to_array or to_axis, which forms the actual vector of values that goes into the numpy array.
The main idea is that all logic to handle the usual attributes sample rate and sample interval that get called can live in these methods, and we can quickly identify by inspecting an object if it is uniform or non-uniform.
Probably users would be strongly encouraged to resample data before archiving (or at least before processing) to a UniformTimeAxis.
sample_rate, sample_intervaland then this could be returned at a selected resolfor amy but it pushes the instances into two cases, Uniform and NonUniform. Uniform, in this case can be used for all time series that
The TimeAxis classes would become mt_metadata objects and would be embedded in RunTS, ChannelTS, Spectrogram or any other time-series-like data container.
Proposed Solution 2:
From rambling thorough Solution 1, it seems that the main issue is the base resolution. If we switch to the attotime package for timestamp handling we get yoctosecond resolution and these issues will possibly go away (at least for MT).
That said, it is still deireable to support non-uniform sampled data, as this is the most general case, and any time series can be represented as a zipped pair of vectors, time and data.
Recently there have been a number of small updates to mth5 about timing issues (refs can include Nov/Dec 2024 PR DQ, and Autumn 2023 KK) making sample rate a settable property.
These updates and some others have been associated with both performance and accuracy. Performance issues have been due to generating the time axis and estimating the sample rate from the timestamps in the time axis. Time axes are basically vectors of timestamps that get bound to data in an xarray or dataframe. In an old version of the code, the sample_rate was always computed on the fly by taking the median difference of the time stamps. This turned out to be impractically expensive for high sample rate data, because the sample rate is referred to many times during metadata validation and was being recomputed at each call. Two solutions were implemented: Solution 1 was to make an
_sample_rate
a property which could be set to None, and if requested, call_compute_sample_rate
and store the value. Then future calls to sample_rate would returnself._sample_rate
once it was not None, skipping the repeated computations. Solution 2 involved baking in the assumption that the data are uniformly sampled and that the time axis is perfectly accurate. This I think happens somewhere in the FC data, and perhaps elsewhere. In this case, the sample rate returned is the difference in the timestamps at position 1, and 0.If all data were uniformly sampled and all timestamps were perfectly precise, this would be fine, but some recent updates have been mostly associated with data that are not sampled with a integer-nanosecond-time-step. For example 3Hz, 30Hz, 24000Hz, etc. (TODO: confirm this is when the idealized sample rate (if integer), has a prime factorization containing a prime number besides 2 or 5 (such as 3, 7, 11, ...) , or the sample rate is only expressible as a floating point with an infinite number of characters after the decimal).
Why are we having this trouble:
Consequences of non-integer-nanosecond sampled data include hard to trace glitches when merging data on the time axis between runs, as well as the fact that the sample-rate property of a RunTS or ChannelTS is not a true and complete characterization of the time axis.
Proposed Solution 1:
TimeAxis
.UniformTimeAxis
andNonUniformTimeAxis
minimum_resolution_of_time_stamp = 1e-9
then a quick test can be done to tell if if a given idealized sample will result in a uniform or non-uniform time axis.resolution
: This property tells how fine the sampling can be trackedstart_time
(orstart_sample
), and/orend_time
(orend_sample
),idealized_sample_rate
(floating point resolution)idealized_sample_interval
to_array
orto_axis
, which forms the actual vector of values that goes into the numpy array.UniformTimeAxis
.sample_rate
,sample_interval
and then this could be returned at a selected resolfor amy but it pushes the instances into two cases, Uniform and NonUniform. Uniform, in this case can be used for all time series thatTimeAxis
classes would become mt_metadata objects and would be embedded inRunTS
,ChannelTS
,Spectrogram
or any other time-series-like data container.Proposed Solution 2:
attotime
package for timestamp handling we get yoctosecond resolution and these issues will possibly go away (at least for MT).Related to MTH5 Issue 225 Stress Tests.
The text was updated successfully, but these errors were encountered: