-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataSeries
object for time-based objects
#247
base: main
Are you sure you want to change the base?
Conversation
DataSeries
object for time-based objects
Thanks @lewardo Yes, we need something like this container for time-series algorithms, but it's a can of worms. I'd started to think about it for my Echo State Network work (a kind of RNN), but ran out of time. So, aspects of the can-of-worms-ness: – The basic interface I had in mind was that each point would indeed be FWIW, |
as a first concept to allow parallel development of time-based algorithms, do you think a |
I think an adapted container that used a |
in terms of integrating that into the |
Hmm. Probably a |
Also from what I understand, in terms of having a single ID-index map if a user were to add multiple frames to one point before then adding to another point later it would introduce the issue that T=n for point X will not be at the same index for every X.
so the |
or perhaps I'm thinking of it the wrong way around, would time be down the |
Ah, not quite how I was imagining the representation. In my mind, each ID maps to a time sequence, e.g:
So, for an RNN of any variety, each training point is itself a time sequence (and remember, that for training an RNN, we'll want to shuffle the order of the sequences without breaking their internal order because the component frames aren't independent) |
Oh, our replies crossed |
After a brief meeting with @tremblap for example patches for use cases we concluded on an interface for reading and writing time series of data from a |
There are various slicing operations that could be useful, that would use |
Hey @lewardo do you want assert report (on print function) here or elsewhere? |
…lly check the max length dynamically zero pad ceil(log10(maxlen)) zero padding would do but maybe we should store a new "maxlength" key
@weefuzzy For many potential new objects implementing a
DataSeries
object for datasets encompassing time would be incredibly useful.In terms of implementation I had a few ideas but do not know which would be better with the rest of the codebase
DataSetClient
, with each point beingN * frameLen
long and dealing with the views and accessing in the wrapper/clientDataSet
of rank 2, and the time being captured in the second point dimensionDataSeries
algorithm that is similar to theDataSet
but separates the subtleties out in a new alg for cleanlinessThe issue with all of them I'm seeing is the memory allocation of the RT updating points, I wanted to confirm the way that
FluidTensors
grow and shrink is memory-efficient, so that updating the length of a point in the middle of a dataset won't have to shift the rest of it down in memory.Currently, you can update a point but not add to it so pushing a time frame onto a point would involve copying that point in its entirety (not a
FluidTensorView
) and concatenating that externally then replacing that point in theDataSet
, unlessFluidTensor
were to be given an equivalent ofpush_back
that would allow in-place expansion of central elementsFor interface @tremblap enjoyed the idea of keeping a similar one to
DataSet
, namely thataddPoint
be the message to push another frame from a buffer to that Id in the dataset, and additional messages could be implemented to load a whole series from a buffer (the issue here would be that theDataSeries
time dimension would not be the buffer time dimension, time in the buffer would hav eot be captured from channel 1 to T), but all of these ideas are still up for discussion.This object is the gateway into being able to implement the more useful algs like
DTW
and various flavours ofRNN
Apologies for the barrage of UI and implementation questions, I am aware you haven't much time but any guidance however brief would be appreciated, I know not the style of implementation that would be best or the nuances of my implementation that may or may not lead to terrible memory performance.