-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
target_partitions execution option is ignored when the input has 1 partition #12611
Comments
In datafusion, target_partition argument doesn't necessarily increase partition count each time. If DataFusion thinks that executing the query in single partition is better in terms of performance, it will do so even if target_partition number is larger than 1. Do you think, parallelism will improve the performance for this query, if you think so we should definitely increase partition for this query. What is your thoughts in this regard? In short, setting target_partitions to larger than 1 doesn't necessarily increase partition in datafusion. |
Thanks for the explanation! I agree that optimizing for performance makes sense, as long as it doesn't compromise guarantees or hurt system predictability. In Ballista, a "task" is generated for each partition, and changing this behavior has caused some unit test assertions to fail. However, I don't think this is a major issue for Ballista. I’m not familiar with how this flag is being used in other systems, but @alamb might have some insights to share. |
I think the reason it is called "target_partitions" is that is is not a guarantee but instead is a target used for performance optimizations as @akurmustafa mentions If you need more than one partition you can always modify the plan / set the required input partitions |
Thanks for the clarifications, closing this one. |
Describe the bug
target_partitions execution option is ignored when the input has 1 partition. The introduced condition here is the root cause.
To Reproduce
Above code produces a physical plan with a single output partition and the plan doesn't contain a RepartitionExec.
When the input partition number is set to a value > 1, it works as expected. (
multi_partitions
becomestrue
here)Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: