-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add ArrowArrayView accessors to inspect buffer properties #638
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't looked in detail at the implementation just a few thoughts from the API
@@ -623,7 +623,8 @@ enum ArrowBufferType { | |||
NANOARROW_BUFFER_TYPE_UNION_OFFSET, | |||
NANOARROW_BUFFER_TYPE_DATA_OFFSET, | |||
NANOARROW_BUFFER_TYPE_DATA, | |||
NANOARROW_BUFFER_TYPE_DATA_VIEW | |||
NANOARROW_BUFFER_TYPE_VARIADIC_DATA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to introduce new members like this should we think about renaming the existing member to NANOARROW_BUFFER_TYPE_FIXED_DATA
? I think the distinction can get easily lost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is definitely a better name, although a bit pedantic if you're not specifically dealing with view arrays (and most of the time, we're not!). There is room to revisit that name in the future...the buffer type is helpful for things like the R and Python bindings that are truly generic wrappers, but for the most part these names don't show up in user code (that I've seen).
(It could also be argued that the data buffer of a string/binary array is more similar to a variadic data buffer than a normal data buffer, which we could also change at some point).
/// | ||
/// In may cases this can also be obtained from the ArrowLayout member of the | ||
/// ArrowArrayView or ArrowSchemaView; however, for binary view and string view types, | ||
/// the element width of each buffer may be different between two arrays of the same type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea here to be forward-looking and support other non-char variadic buffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The element width I'm talking about here is how ArrowArrayViewGetBufferElementBitWidth(array_view, 2)
might be 64 if the second buffer is the sizes or 0 if the second buffer is a data buffer. The existing (if imperfect) system is that string/binary buffer data has an element bitwidth of 0 instead of 8 to signify "blob" rather than "uint8".
@WillAyd I'm happy to take another pass here if you feel strongly about the names here (otherwise, I'm hoping to merge this to focus on the two remaining release blockers!) |
lgtm - merge away! |
This PR abstracts accessors for the buffer_view, buffer type, buffer data type, and element bit width for the
ArrowArrayView
. Before adding string/binary view support, this was done by directly accessing thelayout
andbuffer_view
members; however, this required special-casing + some duplicated code in the string view in the R/Python bindings.This PR also removes the dependence on the
ArrowArrayView::array
member, since this member is optional (i.e., the data backing anArrowArrayView
need not be related to an actualArrowArray
).