-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CH] Duplicated expression evaluation in project before aggregate operator due to PullOutPreProject Rule #8183
Comments
@taiyang-li, just tested Velox backend. It has the same issue.
|
@liujiayi771, could you help take a look? |
I will take a look. |
@liujiayi771 in gluten we already had a rule |
@taiyang-li This rule was discussed a long time ago to be implemented in the logical plan, and we did indeed implement a version based on the logical plan before. There are many cases that are not supported if we only rely on the logical plan, for example, the aggregation generated by Bloom filters can only be seen in the physical plan. I remember there are quite a few such cases. Theoretically, duplicates should not appear here because there is logic to eliminate duplicates; I need to debug and take a look at this. |
@PHILO-HE @taiyang-li The reason for the issue here is that after preProject is extracted, when it is merged with the previous project in the CollapseProjectExecTransformer rule, n3 must be replaced with the actual computed expression; otherwise, it will not be able to bindReference. To eliminate duplicate executions, in this case, the pre-project that is pulled out should not be merged within the CollapseProjectExecTransformer, which would result in an additional project operator being created. However, this would prevent the expression from being computed multiple times. What do you think? cc @ulysses-you Do you have any suggestions? |
@liujiayi771 it is ok for clickhouse backend, because CH plan optimization would merge adjacent project operators. Besides, whether they are merged or not in CH doesn't have more impacts than duplicated expression evaluations. |
@taiyang-li I had a discussion with @ulysses-you offline, and in this case, we need to implement a more refined collapsing strategy. We will remove the column for |
Backend
CH (ClickHouse)
Bug description
In below query, expr
if(id % 2 = 0, id+3, id+4)
was evaluated twice both inn3
and_pre_5
. I find that_pre_5
is introduced byPullOutPreProject
in #4213. cc @PHILO-HE @zhztheplayer can you find similar issue in Velox?Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response
The text was updated successfully, but these errors were encountered: