-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support convert_to_state
for AVG
accumulator
#11734
Conversation
There is something strange going on with AVG in this query -- it is giving different answers when convert to state is enabled vs not. Maybe it is due to float rounding, but I am not confident |
Kind of expected -- result set is sorted by I've got different results after two consecutive runs even on main branch (without any skipped aggregation). |
I debugged this a bit more and I think the issue may be that |
Oh, the |
7020dcf
to
42daa94
Compare
42daa94
to
b60e1aa
Compare
b60e1aa
to
f05c4cd
Compare
This PR is failing like the following when running on benchmarks on TPCH. I think there may be a bug related to types in the intermediates. I will keep debugging
Update: turns out it is q18: (venv) andrewlamb@Andrews-MBP-2:~/Software/datafusion/benchmarks/data/tpch_sf1$ /Users/andrewlamb/Software/datafusion/datafusion-cli/target/debug/datafusion-cli -f ../../queries/q18.sql Update: filed real issue here #11832 |
f05c4cd
to
e3eb80f
Compare
0c4df2e
to
f3bedc0
Compare
datafusion/physical-expr-common/src/aggregate/groups_accumulator/nulls.rs
Outdated
Show resolved
Hide resolved
LGTM, thank you @alamb. Regarding q28 slowdown from PR description -- I suppose it's not a stable slowdown, and just a result on single benchmark run (since the regexp over |
That is my understanding too. I also have high hopes that the StringView work will make that query in particualr faster as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you
/// │false│ │ │NULL │ │NULL │ | ||
/// │false│ │true │ │true │ | ||
/// └─────┘ └─────┘ └─────┘ | ||
/// array opt_filter output nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like output nulls has typo, should be false; true; false; false; fasle
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are correct -- thank you for catching that. I fixed it in 149406b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb
@@ -267,6 +282,20 @@ FROM aggregate_test_100_null GROUP BY c2 ORDER BY c2; | |||
4 11 14 | |||
5 8 7 | |||
|
|||
# Test avg for tinyint / float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering why the test is only for tinyint / floats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was that AVG
the accumulator is already tested elsewhere -- this test is only to exercise the partial aggregate skipping logic
Thanks @comphead -- we are making progress here slowly -- but I am pretty stoked to see it 🚀 |
Nice work 🎉 |
I am very excited to get the @Rachelint is also working on some good stuff. |
Note: There are ~20 lines of code in this PR, the rest is docmentation and tests
Which issue does this PR close?
Avg
aggregate: implementconvert_to_state
#11816Rationale for this change
To take advantage of the benefits of #11627 a new method must be implemented for each
GroupsAccumulator
.At least one ClickBench query (the one in #6937) uses
AVG
so let's implement thatWhat changes are included in this PR?
convert_to_state
forAVG
accumulatorAre these changes tested?
Yes with new unit tests
Performance benchmarks:
Clickbench on the whole looks better
Interestingly my benchmark shows Q31 and Q32 get faster (they both have avg):
datafusion/benchmarks/queries/clickbench/queries.sql
Lines 32 to 33 in 4e278ca
But Q28 gets slower
datafusion/benchmarks/queries/clickbench/queries.sql
Line 29 in 4e278ca
Details
Are there any user-facing changes?
Faster performance