You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would we be interested to provide some functionality / tooling to make it easy to do "nan insertion" when timeseries have holes? This is already built into the spectra plotting that I recently pushed, and could be relevant to plenty of other cases (i.e. trajectories, statistics, etc). The idea is to avoid ugly linear interpolation in plots when there are some data holes.
See for example the difference between:
nan insertion:
no nan insertion (default as of now, ugly lines when instruments loose contact because buried in snow or whatever else reason)
difference highlighted by putting the no nan in black and other in color
I end up using stuff like this all the time, so I have prepared my own function. I would be happy to i) add it to some trajan tooling, ii) use it to refactor how I do the nan insertion in the spectra plotting, iii) start working / thinking about how this could be integrated into the other potting functionalities (a switch to use it, with argument forwarding for the hole duration threshold?)
# %%importnumpyasnpfromtypingimportListimportnumpy.typingasnpt# %%definsert_NaNs_in_time_holes(
common_time_index: npt.NDArray[np.datetime64],
hole_threshold: np.timedelta64,
list_data_arrays: List[npt.NDArray],
data_arrays_time_index: int=0):
"""Insert NaNs where holes are detected in common_time_index. This is useful, for example, to fill holes with NaNs before plotting to avoid ugly lines on the plots. Args: common_time_index: the time index, should be common to all arrays in list_data_arrays hole_threshold: the threshold over which we consider that there is a hole. The NaNs will be filled at start_hole + hole_threshold/4 and end_hole - hole_threshold/4 , for each hole list_data_arrays: a list of the data arrays using the common_time_index data_arrays_time_index: the index used for time in the arrays within list_data_arrays Returns: (new_common_time_index, list_new_data_arrays): new_time_index is a new time index (i.e. a separate copy) with the start and ends of NaN periods inserted, list_new_data_arrays is a new list of data arrays (i.e. separate copies) with the NaN hole delimiters inserted. """# common_time_index should be datetime64 and increasingassertnp.issubdtype(common_time_index.dtype, np.datetime64)
assertnp.all(np.diff(common_time_index)) >=0hole_threshold=hole_threshold.astype('timedelta64[ns]')
# each list_data_arrays array should have a time dimension equal to common_time_indexforcrrt_data_arrayinlist_data_arrays:
assertnp.shape(crrt_data_array)[data_arrays_time_index] ==np.shape(common_time_index)[0]
time_deltas=common_time_index[1:] -common_time_index[:-1]
time_jump_indexes=np.where(time_deltas>hole_threshold)[0]
time_hole_starts=common_time_index[time_jump_indexes] +hole_threshold/4time_hole_ends=common_time_index[time_jump_indexes+1] -hole_threshold/4# intertwindle the 2time_hole_limits=np.dstack((time_hole_starts, time_hole_ends)).flatten()
time_jump_indexes=np.repeat(time_jump_indexes, 2)
new_common_time_index=np.insert(common_time_index, time_jump_indexes+1, time_hole_limits)
list_new_data_arrays= []
forcrrt_data_arrayinlist_data_arrays:
to_insert=np.transpose(np.nan*crrt_data_array[0])
new_crrt_data_array=np.insert(crrt_data_array, time_jump_indexes+1, to_insert, data_arrays_time_index)
list_new_data_arrays.append(new_crrt_data_array)
return (new_common_time_index, list_new_data_arrays)
# %%# example / test# TODO: if moving to dedicated functionality, make this into the associated testscommon_time_index=np.array(
[
np.datetime64("2024-01-01T00:00:00"),
np.datetime64("2024-01-01T01:00:00"),
np.datetime64("2024-01-01T06:00:00"),
np.datetime64("2024-01-01T07:00:00"),
np.datetime64("2024-01-01T08:00:00"),
np.datetime64("2024-01-01T09:00:00"),
np.datetime64("2024-01-01T12:00:00"),
np.datetime64("2024-01-01T13:00:00"),
np.datetime64("2024-01-01T13:30:00"),
]
)
expected_new_common_time_index=np.array(
[
np.datetime64("2024-01-01T00:00:00"),
np.datetime64("2024-01-01T01:00:00"),
np.datetime64("2024-01-01T01:30:00"),
np.datetime64("2024-01-01T05:30:00"),
np.datetime64("2024-01-01T06:00:00"),
np.datetime64("2024-01-01T07:00:00"),
np.datetime64("2024-01-01T08:00:00"),
np.datetime64("2024-01-01T09:00:00"),
np.datetime64("2024-01-01T09:30:00"),
np.datetime64("2024-01-01T11:30:00"),
np.datetime64("2024-01-01T12:00:00"),
np.datetime64("2024-01-01T13:00:00"),
np.datetime64("2024-01-01T13:30:00"),
]
)
hole_threshold=np.timedelta64(2, "h")
list_data_arrays= [
np.array([1, 2, 5, 6, 7, 8, 11, 12, 13]).astype('float'),
np.array([10, 20, 50, 60, 70, 80, 110, 120, 130]).astype('float'),
np.array([[1,0], [2,0], [5,0], [6,0], [7,0], [8,0], [11,0], [12,0], [13,0],]).astype('float'),
]
expected_list_new_data_arrays= [
np.array([1, 2, np.nan, np.nan, 5, 6, 7, 8, np.nan, np.nan, 11, 12, 13]).astype('float'),
np.array([10, 20, np.nan, np.nan, 50, 60, 70, 80, np.nan, np.nan, 110, 120, 130]).astype('float'),
np.array([[1,0], [2,0], [np.nan,np.nan], [np.nan,np.nan], [5,0], [6,0], [7,0], [8,0], [np.nan,np.nan], [np.nan,np.nan], [11,0], [12,0], [13,0],]).astype('float'),
]
data_arrays_time_index=0
(new_common_time_index, list_new_data_arrays) =insert_NaNs_in_time_holes(common_time_index, hole_threshold, list_data_arrays)
defcompare_np_arrays_with_nan(a, b, epsilon=1.0e-12):
"""Given 2 arrays of floats, check for corresponding indexes that either both have values close to each other at epsilon prediction, or both have nans."""res=np.logical_or(
np.abs(a-b) <epsilon,
np.logical_and(np.isnan(a), np.isnan(b))
).all()
returnresassertnp.all(new_common_time_index==expected_new_common_time_index)
assertcompare_np_arrays_with_nan(list_new_data_arrays[0], expected_list_new_data_arrays[0])
assertcompare_np_arrays_with_nan(list_new_data_arrays[1], expected_list_new_data_arrays[1])
assertcompare_np_arrays_with_nan(list_new_data_arrays[2], expected_list_new_data_arrays[2])
# %%
The text was updated successfully, but these errors were encountered:
A question is if we like better to add this nan hole filling on the dataset and modify it, or if we just want some small function to edit the data "on the fly" when we need to plot it, without changing the underlying dataset (now I just call this function above in my code to modify the arrays to plot for example, so I am in the second case and I do not do modifications of the whole dataset).
Another question is how to handle cases when there are more time arrays (ie time for position / wave / temperature etc that differ from each others, common with buoys). If we modify the dataset we may have to think about the different groups of related variables and dimensions etc. If we just apply a bit of data modification before plotting we do not need to think about as many aspects, as we can just call the data modifying function on the fly on the exact data that we are preparing to plot :) .
I think that a very important principle in trajan is to stay close to xarray, numpy and matplotlib so that it is easy for the user to reason out what happens. The way things are done in xarray is usually to chain various limitations and selections on the dataset, then plot. These changes do not change the original dataset, but creates a copy or a view. The same with plotting, we should set up the future like xarray does and let regular matplotlib save.
Would we be interested to provide some functionality / tooling to make it easy to do "nan insertion" when timeseries have holes? This is already built into the spectra plotting that I recently pushed, and could be relevant to plenty of other cases (i.e. trajectories, statistics, etc). The idea is to avoid ugly linear interpolation in plots when there are some data holes.
See for example the difference between:
nan insertion:
no nan insertion (default as of now, ugly lines when instruments loose contact because buried in snow or whatever else reason)
difference highlighted by putting the no nan in black and other in color
I end up using stuff like this all the time, so I have prepared my own function. I would be happy to i) add it to some trajan tooling, ii) use it to refactor how I do the nan insertion in the spectra plotting, iii) start working / thinking about how this could be integrated into the other potting functionalities (a switch to use it, with argument forwarding for the hole duration threshold?)
The text was updated successfully, but these errors were encountered: