Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

version 1 #33

Open
JeffreySarnoff opened this issue Nov 2, 2022 · 8 comments
Open

version 1 #33

JeffreySarnoff opened this issue Nov 2, 2022 · 8 comments

Comments

@JeffreySarnoff
Copy link
Owner

This is for discussion of issues specific to the design of version 1.
@bkamins

@JeffreySarnoff
Copy link
Owner Author

JeffreySarnoff commented Nov 2, 2022

@bkamins version 1 is where the padding stuff happens. along the way a few questions arose.

I am adopting an "accumulator" approach to statistics available incrementally.
e.g. accum = AccMax(); accum = AccMeanVar()

In the first example, with each new x the current value of the accumulator is immediately available and can be returned with little overhead. In the second example, variance is computed using fields internal to the accumulator. With each new x the current value of the accumulator is a 2-tuple (mean, variance), and it is not immediately available, returning it has some overhead.

For all accumulators, accum() does provide the current value[s] being accumulated. Is it better to return nothing for all calls like accum(x) and let the client get the current value[s] with accum() when desired, or to return accum() at the end of accum(x)?

@bkamins
Copy link

bkamins commented Nov 2, 2022

What you plan for is very similar to https://github.com/joshday/OnlineStats.jl, so you might want to have a look at that implementation also.

Regarding your question - I think accum(x) should not return the statistic as it would add an overhead as you comment.
Instead I think the natural thing to do would be for accum(x) to return accum. The reason is that it makes chaining easiest. Also you can then write accum(x)() if you want to get the value immediately. Finally you probably design custom printing for accum that does compute the statistic always (as printing is expensive anyway). This means that when you call accum(x) in REPL you get the value of the statistic printed anyway (but when accum(x) is not displayed the statistic is not computed immediately).

@JeffreySarnoff
Copy link
Owner Author

That's a good suggestion about accum(x)()
I am familiar with OnlineStats, and at first considered just using that pkg --
I would rather allow it than subsume it, there were some fit issues.

@JeffreySarnoff
Copy link
Owner Author

[@juliohm @bkamins] I am continuing the discussion at TableTransforms #121 here, to keep the information for RollingFunctions more contiguous.

Tables.jl, TableTransforms.jl, DataFrames.jl and .. understand e.g. xs::Vector{Float32} [where "understanding" is operational and abstractly applicative] as a realization of some AbstractColumn or RowAbstraction. That is helpful, as they come laden with capability, operational élan, reliability, and a dollop of correctness.

My perspective on rolling functions over windows into data and transformations is that none of the following should be excluded and all should be similarly constructable and useable in a shared way.

  • rolling functions over windows into data
  • rolling functions over windows into transformed data
  • rolling transformed functions over windows into data
  • rolling functions over transformed windows into data
  • etc

In addition, the ability to pre- or post- pad with given value or with a sequence of determinate values (tapering) must be available and essentially effortless.

The first level of rollable functions are directly implementations of incremental algorithms that update the functional value (e.g. a descriptive statistic) with each next step within the windowed data. Each of these is performant. What is necessary both for this package and for seamless use with DataFrames and TableTransforms, is to support melding two or more first level capabilities (incremental updating of the extrema, the mean, and an exponentially weighted mean) rather than simply stacking them. Wrapping them in a pipe that pumps each new observation through the shared API would work and offers the potential to use multiple threads effectively.

OnlineStats.jl is not restricted to incremental updating, covers most of the first level descriptive functions in a similar way, and many more. It is important to let those stats be used (made rollable). The intent of that package is to process and provide with a single look at the items within a data[stream]. Rolling over windowed data involves structural subsequences by definition. So there is interplay, and smooth interuse takes careful consideration. @joshday

@JeffreySarnoff
Copy link
Owner Author

My current approach to incremental stats is shown for rolling minimum in this gist.

@JeffreySarnoff
Copy link
Owner Author

This gist shows my current approach to incremental stats with optional stream element preprocessing, again using rolling minimum.

@JeffreySarnoff
Copy link
Owner Author

JeffreySarnoff commented Nov 22, 2022

These are the "single stepped incremental statistics" available
I need to add some two+ argument incremental stats (cov, corr)
[open to suggestion add? remove?]

    AccMinimum, AccMaximum, AccExtrema,
    AccSum, AccProd,
    AccMean, AccGeoMean, AccHarmMean,
    AccMeanVar, AccMeanStd, AccStats,

    AccMinimumAbs, AccMaximumAbs, AccExtremaAbs,
    AccSumAbs, AccProdAbs,

    # exponentially weighted versions
    # these also initialize  α, the decay parameter
    #    either directly or via span, halflife, center of mass
    #    there is auto initialization logic too

    AccMinimumEW, AccMaximumEW, AccExtremaEW,
    AccSumEW, AccProdEW,
    AccMeanEW, AccGeoMeanEW, AccHarmMeanEW,
    AccMeanVarEW, AccMeanStdEW, AccStatsEW,

    AccMinimumEW, AccMaximumEW, AccExtremaAbsEW,
    AccSumAbsEW, AccProdAbsEW,

not shown (not yet finalized, part of the windowing facilities)
are the use of other / arbitrary normalized weights

@JeffreySarnoff
Copy link
Owner Author

This shows an accumulator with local short term memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants