feat: grouping sets functionality #550

wesm · 2015-08-17T19:36:37Z

No description provided.

cpcloud · 2021-12-07T21:49:36Z

This could also capture generic GROUPING SETS idioms, not just ROLLUP.

cpcloud · 2021-12-28T18:06:40Z

GROUPING SETS is a useful tool for data scientists and engineers. It allows computing aggregations for different combinations of grouping keys, all in the same query.

PostgreSQL has great docs on the feature: https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-GROUPING-SETS.

I'm not entirely sure what the API should look like, but here are a few requirements and considerations for anyone who wants to work on this:

Whatever the API looks like, it needs to compose with the existing group_by method. It is both legitimate and useful to combine grouping sets, cube, rollup, and group by in a single query.
For example (from the above postgres docs):

GROUP BY a, CUBE (b, c), GROUPING SETS ((d), (e))

which expands to

GROUP BY GROUPING SETS (
    (a, b, c, d), (a, b, c, e),
    (a, b, d),    (a, b, e),
    (a, c, d),    (a, c, e),
    (a, d),       (a, e)
)

In my mind this rules out a method that exists only on GroupedTableExpr (in an effort to prevent people from accessing the API without calling group_by) because

t.group_by(...).cube()

doesn't make a whole lot of sense unless we allow group_by to take zero arguments, itself a change that I think will lead to unmaintainable hacks.

There are two viable approaches I see.

group_by grows the ability to accept special (new) objects ibis.grouping_sets(sequence_of_tuples), ibis.cube(tuple), ibis.rollup(tuple). The above postgres example would be written as:

import ibis

t = ibis.table([
    ("a", "string"),
    ("b", "string"),
    ("c", "string"),
    ("d", "string"),
    ("money", "float64"),
])

t.group_by([
    "a",
    ibis.cube(("b", "c")),
    ibis.grouping_sets([("d",), ("e",)])
])

I prefer this approach because it requires the least number of invasive changes and I believe will not require any hacks.

TableExpr grows some new methods (grouping_sets, cube, rollup).

This is pretty tantalizing, and superficially seems to compose well until you start trying to ... compose these methods:

# using t from the previous example, seems nice but see below
t.group_by("a").cube(("b", "c")).grouping_sets([("d",), ("e",)])

# equivalent except for column ordering
t.cube(("b", "c")).group_by("a").grouping_sets([("d",), ("e",)])
# t.cube would need to have a single object that tracks all the grouping set "things"
# to make sure it doesn't generate `GROUP BY` multiple times.

Since cube et al are limited to grouping in SQL, we should not introduce a new API that allows grouping set-like things to be use in an apparently-outside-of-group-by-but-not-actually context.

jcrist · 2024-06-26T21:08:19Z

Reopening this, as we'll probably want it for full TPC-DS support (#9447).

landonAcuityMD · 2024-06-26T21:50:56Z

Definite nice to have - was just looking for a way to do this today. :D

wesm added feature Features or general enhancements analytics Higher-level data analysis API related issues or PRs labels Aug 17, 2015

wesm added this to the 0.5 milestone Aug 17, 2015

wesm self-assigned this Aug 27, 2015

wesm modified the milestones: 0.7, 0.5 Sep 4, 2015

cpcloud modified the milestones: 0.11, Future Jun 12, 2017

datapythonista removed this from the Future milestone Nov 13, 2020

cpcloud changed the title ~~Simulate SQL ROLLUP with unions~~ feat: grouping sets functionality Dec 28, 2021

cpcloud added this to the 4.0.0 milestone Apr 19, 2022

cpcloud modified the milestones: 4.0.0, 4.x Aug 31, 2022

cpcloud mentioned this issue Nov 4, 2022

feat(api): add agg alias for aggregate #4765

Merged

cpcloud modified the milestones: 4.1, 5.0 Jan 24, 2023

cpcloud removed this from the 5.0 milestone Feb 2, 2023

cpcloud closed this as completed Mar 30, 2023

cpcloud added the unresolved Issues that are closed but still available for implementation label Mar 31, 2023

cpcloud reopened this Mar 31, 2023

cpcloud closed this as not planned Won't fix, can't repro, duplicate, stale Mar 31, 2023

jcrist reopened this Jun 26, 2024

jcrist unassigned wesm Jun 26, 2024

jcrist added this to the Q3 2024 milestone Jun 26, 2024

cpcloud self-assigned this Jul 3, 2024

cpcloud removed the unresolved Issues that are closed but still available for implementation label Jul 3, 2024

lostmygithubaccount removed this from the Q3 2024 milestone Jul 17, 2024

cpcloud linked a pull request Sep 11, 2024 that will close this issue

feat(api): support rollup, cube and grouping_sets APIs #9945

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: grouping sets functionality #550

feat: grouping sets functionality #550

wesm commented Aug 17, 2015

cpcloud commented Dec 7, 2021

cpcloud commented Dec 28, 2021

jcrist commented Jun 26, 2024

landonAcuityMD commented Jun 26, 2024

feat: grouping sets functionality #550

feat: grouping sets functionality #550

Comments

wesm commented Aug 17, 2015

cpcloud commented Dec 7, 2021

cpcloud commented Dec 28, 2021

jcrist commented Jun 26, 2024

landonAcuityMD commented Jun 26, 2024