From e81b1e86e79c888bed707be9e25ef8fa96d0e62d Mon Sep 17 00:00:00 2001 From: "Igoshev, Iaroslav" Date: Fri, 26 Apr 2024 10:18:02 +0000 Subject: [PATCH] Update groupby section Signed-off-by: Igoshev, Iaroslav --- docs/flow/modin/core/dataframe/algebra.rst | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/flow/modin/core/dataframe/algebra.rst b/docs/flow/modin/core/dataframe/algebra.rst index d5f92dc9fd6..c0a2b8f5374 100644 --- a/docs/flow/modin/core/dataframe/algebra.rst +++ b/docs/flow/modin/core/dataframe/algebra.rst @@ -90,7 +90,14 @@ equals to the number of CPUs so that each single axis partition gets processed i GroupBy operator ---------------- Evaluates GroupBy aggregation for that type of functions that can be executed via TreeReduce approach. -To be able to form groups engine broadcasts ``by`` partitions to each partition of the source frame. +To be able to form groups engine broadcasts ``by`` partitions to each partition of the source frame or +applies range partitioning approach. + +This operator performs best when the cardinality of ``by`` columns is low (small number of output groups). +At the ``Map`` stage, the operator computes the aggregation for each row partition individually, meaning, +that the ``Reduce`` stage takes a dataframe with the following number of rows: +``num_groups * n_row_parts``. If the number of groups is too high, there's a risk of getting a dataframe +with even bigger than the initial shape at the ``Reduce`` stage. Default-to-pandas operator --------------------------