how do I compute the cumulative sum over a timeseries with Ibis? #6394
Answered
by
jcrist
lostmygithubaccount
asked this question in
Q&A
-
I have a import ibis
import pandas as pd
ts = pd.date_range(end=pd.Timestamp.today(), periods=90*5, freq='4.8H')
users = ["a"] * len(ts)
df = pd.DataFrame({"timestamp": ts, "users": user}) then the solution in pandas is: df_rolling = df.resample("W").count().cumsum() or to compute for a given window: df_rolling = df.resample("W").count().rolling(window=7).sum() what's the easiest/best/most proper way to achieve this in Ibis? I would think something along the lines of: t = ibis.memtable(df)
t.order_by(_.timestamp.desc()).mutate(timestamp = _.timestamp.truncate("W")).group_by(_.timestamp).agg(count=_.count()) but this isn't quite correct |
Beta Was this translation helpful? Give feedback.
Answered by
jcrist
Jun 7, 2023
Replies: 1 comment
-
This can be done using a windowed aggregation using an In [1]: import ibis, pandas as pd
In [2]: df = pd.DataFrame({"x": [1, 2, 3, 4, 5, 6, 7, 11]})
In [3]: t = ibis.memtable(df)
In [4]: ibis.options.interactive = True
In [5]: t.mutate(y=t.x.sum().over(rows=(None, 0))) # y is the sum of x over an unbounded preceding window
Out[5]:
┏━━━━━━━┳━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├───────┼───────┤
│ 1 │ 1 │
│ 2 │ 3 │
│ 3 │ 6 │
│ 4 │ 10 │
│ 5 │ 15 │
│ 6 │ 21 │
│ 7 │ 28 │
│ 11 │ 39 │
└───────┴───────┘
In [6]: ibis.show_sql(t.mutate(y=t.x.sum().over(rows=(None, 0)))) # view the generated SQL
SELECT
t0.x,
SUM(t0.x) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS y
FROM _ibis_pandas_memtable_2ropysz7o2tt6t1yyt5humzkt AS t0 |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
jcrist
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This can be done using a windowed aggregation using an
UNBOUNDED PRECEDING
window: