-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION]: Inconsistent Behavior Between prefer_existing_sort and AggregateExec's required_input_ordering #14231
Comments
@alamb, before we make an attempt into this, do you have any thoughts you can share? |
The description of the issue on this ticket makes sense to me
Yes I agree this is non ideal
100% agree
Agree. Looking at the code, it seems to me like datafusion/datafusion/physical-plan/src/aggregates/mod.rs Lines 491 to 500 in 5edb276
What about the following:
This would mean:
The downside is that the optimizer would not resort data even when it might be beneficial (e.g. if the data could be sorted cheaply with a prefix and then use a smaller hash table). However making that optimization work requires a choice between partial sort + ordered group by is better than a hash group by. Our optimizer today has no cost model / framework to evaluate such tradeoffs (and there are known challenges with cost estimate based optimizers anyways) |
Maybe we can create an API like |
My preference is to try and keep the core of datafusion focused on executing the plans as provided as much as possible, and performing "always good optimizations" For optimizations where there is some tradeoff (like choosing between sorting for sort merge join or hashing, for example) I strongly suggest we keep as much of that out of the core as possible (and use user defined passes instead). The rationale is that when tradeoffs are present, no particular choice will be ideal for all usecase (hence why we already have If we make the optimizer passes in the core of datafusion have baked in tradeoffs/heuristics I think it will just get more and more complicated as people try to change how the tradeoffs work I feel strongly enough about this to help with the project |
I agree with this. Therefore I am trying to think of an API that will expose the information necessary for downstream rules/projects/users to use them if they want, but we won't have to make decisions for them unless they are of the "always good" kind. IIRC, we have the "benefits_from" kind of API for partitioning. I wonder if doing the same makes sense here too. I will think more about this. |
Great, let's collaborate on this! |
I noticed an inconsistency in how the optimizer handles ordering in certain scenarios, particularly involving the prefer_existing_sort configuration and the creation behavior of AggregateExec.
1. Background on
prefer_existing_sort
The prefer_existing_sort configuration, part of the enforce_distribution optimizer rule, determines whether the optimizer should use an order-preserving RepartitionExec or a non-order-preserving one. If order needs to be satisfied above the RepartitionExec, a SortExec is added.
datafusion/datafusion/core/src/physical_optimizer/enforce_distribution.rs
Lines 1279 to 1293 in 5edb276
2. Creation Behavior of
AggregateExec
AggregateExec sets its required_input_ordering based solely on its group-by expressions without checking any configuration like prefer_existing_sort. This effectively makes the ordering a hard requirement.
datafusion/datafusion/physical-plan/src/aggregates/mod.rs
Lines 461 to 479 in 5edb276
3. The issue
When these two behaviors interact, if the order is being preserved below a
RepartitionExec
and above theRepartitionExec
if there's anAggregateExec
, the optimizer decides to add aSortExec
, no matter whatprefer_existing_sort
is set (because now it's a hard requirement).While
AggregateExec
benefits from receiving ordered input, adding aSortExec
in this context can incur a significant performance cost, negating any benefits of preserving the order.4. Possible solutions:
A straightforward approach could involve
AggregateExec
respecting theprefer_existing_sort
configuration before adding ordering requirements. However, this introduces challenges:The
prefer_existing_sort
setting exists at the optimizer level and injecting it into AggregateExec may lead to poor design. Also evaluating this configuration at runtime feels conceptually incorrect.Given these challenges, I wanted to open a discussion on alternative solutions or design approaches to address this behavior.
Looking forward to hearing the community's thoughts on this!
The text was updated successfully, but these errors were encountered: