-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up pad_nulls
for FixedLenByteArrayBuffer
#6297
base: master
Are you sure you want to change the base?
Conversation
Benchmark results for these changes (pad loop is as in master branch):
|
I must have hallucinated last night...turns out Linux with newer xeon
Hmm, this is pretty system specific. Old mac laptop
|
Sorry I didn't follow all the nance of the results here. It seems like you have concluded that which approach is faster is a result of what the target architecture is? If this is the case, I think my preference would be go with the simplest code (e.g. |
Sorry, I tend to ramble...I concluded that
agreed! |
I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more |
I'm starting to review this one. Just give me a little bit to dig into the decoding. |
Which issue does this PR close?
Closes #6296.
Rationale for this change
See issue.
What changes are included in this PR?
Changes
pad_nulls
forFixedLenByteArrayBuffer
as described in the issue. Benchmarking seems to indicate that using the new approach forbyte_length > 4
is optimal. The loop is better forbyte_length <= 4
because the compiler will eliminate the loop via unrolling.This PR does include extensive changes to the
arrow_reader
bench to allow benchmarking these changes.Are there any user-facing changes?
No