how do I compute the cumulative sum over a timeseries with Ibis? #6394

lostmygithubaccount · 2023-06-07T19:01:12Z

lostmygithubaccount
Jun 7, 2023
Maintainer

I have a timestamp column and a users column. for example, something like:

import ibis

import pandas as pd

ts = pd.date_range(end=pd.Timestamp.today(), periods=90*5, freq='4.8H')
users = ["a"] * len(ts)

df = pd.DataFrame({"timestamp": ts, "users": user})

then the solution in pandas is:

df_rolling = df.resample("W").count().cumsum()

or to compute for a given window:

df_rolling = df.resample("W").count().rolling(window=7).sum()

what's the easiest/best/most proper way to achieve this in Ibis? I would think something along the lines of:

t = ibis.memtable(df)
t.order_by(_.timestamp.desc()).mutate(timestamp = _.timestamp.truncate("W")).group_by(_.timestamp).agg(count=_.count())

but this isn't quite correct

Answered by jcrist

Jun 7, 2023

This can be done using a windowed aggregation using an UNBOUNDED PRECEDING window:

In [1]: import ibis, pandas as pd

In [2]: df = pd.DataFrame({"x": [1, 2, 3, 4, 5, 6, 7, 11]})

In [3]: t = ibis.memtable(df)

In [4]: ibis.options.interactive = True

In [5]: t.mutate(y=t.x.sum().over(rows=(None, 0)))  # y is the sum of x over an unbounded preceding window
Out[5]: 
┏━━━━━━━┳━━━━━━━┓
┃ x     ┃ y     ┃
┡━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├───────┼───────┤
│     1 │     1 │
│     2 │     3 │
│     3 │     6 │
│     4 │    10 │
│     5 │    15 │
│     6 │    21 │
│     7 │    28 │
│    11 │    39 │
└───────┴───────┘

In [6]: ibis.show_sql(t.mutate(y=t.x.sum().over(rows=(None, 0))))  # view the…

View full answer

jcrist · 2023-06-07T19:15:04Z

jcrist
Jun 7, 2023
Maintainer

This can be done using a windowed aggregation using an UNBOUNDED PRECEDING window:

In [1]: import ibis, pandas as pd

In [2]: df = pd.DataFrame({"x": [1, 2, 3, 4, 5, 6, 7, 11]})

In [3]: t = ibis.memtable(df)

In [4]: ibis.options.interactive = True

In [5]: t.mutate(y=t.x.sum().over(rows=(None, 0)))  # y is the sum of x over an unbounded preceding window
Out[5]: 
┏━━━━━━━┳━━━━━━━┓
┃ x     ┃ y     ┃
┡━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├───────┼───────┤
│     1 │     1 │
│     2 │     3 │
│     3 │     6 │
│     4 │    10 │
│     5 │    15 │
│     6 │    21 │
│     7 │    28 │
│    11 │    39 │
└───────┴───────┘

In [6]: ibis.show_sql(t.mutate(y=t.x.sum().over(rows=(None, 0))))  # view the generated SQL
SELECT
  t0.x,
  SUM(t0.x) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS y
FROM _ibis_pandas_memtable_2ropysz7o2tt6t1yyt5humzkt AS t0

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how do I compute the cumulative sum over a timeseries with Ibis? #6394

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

how do I compute the cumulative sum over a timeseries with Ibis? #6394

lostmygithubaccount Jun 7, 2023 Maintainer

Replies: 1 comment

jcrist Jun 7, 2023 Maintainer

lostmygithubaccount
Jun 7, 2023
Maintainer

jcrist
Jun 7, 2023
Maintainer