-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve grouping performance via better vectorization in accumulate functions #6954
Conversation
value_fn(group_index, new_value); | ||
} | ||
// update seen values in separate loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea is to use a separate loop
Note we can skip this update entirely once all the groups have been seen -- I will explore if I can figure out how to do that next.
value_fn(group_index, new_value); | ||
} | ||
// update seen values in separate loop | ||
for &group_index in group_indices.iter() { | ||
seen_values.set_bit(group_index, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we assume seen_values
is true for every group_index for "no nulls, no filter" until we observed a first null-value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an excellent point. I will try and take that into account
af3f62b shows a little benefit bit maybe in the noise range
|
I am now benchmarking the special case of "don't track nulls they can't possibly be present |
Special casing the nulls seems to have helped a little, though the fact that some queries now report slower is surprising to me
|
Noise maybe? Your previous run seems to have some noisy results as well. |
Yeah, I am trying to control for noise by running these experiments on a GCP machine (not my laptop) 🤔 |
Tracking this basic idea in #7066 , I don't plan ti pursue this PR anymore for now |
Draft as I am running performance tests and will clean up the code a bit if possible.
Which issue does this PR close?
Part of #6889
This is a follow on to #6904 where some last minute changes (to track null state correctly) decreased performance slightly -- I am trying to get some back
Rationale for this change
I am trying to help the compiler to vectorize the inner loops of the accumulators. Let's do this by doing the minimal work as possible
What changes are included in this PR?
Update null state with a separate loop
Are these changes tested?
covered by existing tests
Are there any user-facing changes?