Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ArrowArrayView accessors to inspect buffer properties #638

Merged
merged 9 commits into from
Oct 1, 2024

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Sep 30, 2024

This PR abstracts accessors for the buffer_view, buffer type, buffer data type, and element bit width for the ArrowArrayView. Before adding string/binary view support, this was done by directly accessing the layout and buffer_view members; however, this required special-casing + some duplicated code in the string view in the R/Python bindings.

This PR also removes the dependence on the ArrowArrayView::array member, since this member is optional (i.e., the data backing an ArrowArrayView need not be related to an actual ArrowArray).

Copy link
Contributor

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't looked in detail at the implementation just a few thoughts from the API

@@ -623,7 +623,8 @@ enum ArrowBufferType {
NANOARROW_BUFFER_TYPE_UNION_OFFSET,
NANOARROW_BUFFER_TYPE_DATA_OFFSET,
NANOARROW_BUFFER_TYPE_DATA,
NANOARROW_BUFFER_TYPE_DATA_VIEW
NANOARROW_BUFFER_TYPE_VARIADIC_DATA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to introduce new members like this should we think about renaming the existing member to NANOARROW_BUFFER_TYPE_FIXED_DATA? I think the distinction can get easily lost

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely a better name, although a bit pedantic if you're not specifically dealing with view arrays (and most of the time, we're not!). There is room to revisit that name in the future...the buffer type is helpful for things like the R and Python bindings that are truly generic wrappers, but for the most part these names don't show up in user code (that I've seen).

(It could also be argued that the data buffer of a string/binary array is more similar to a variadic data buffer than a normal data buffer, which we could also change at some point).

///
/// In may cases this can also be obtained from the ArrowLayout member of the
/// ArrowArrayView or ArrowSchemaView; however, for binary view and string view types,
/// the element width of each buffer may be different between two arrays of the same type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here to be forward-looking and support other non-char variadic buffers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The element width I'm talking about here is how ArrowArrayViewGetBufferElementBitWidth(array_view, 2) might be 64 if the second buffer is the sizes or 0 if the second buffer is a data buffer. The existing (if imperfect) system is that string/binary buffer data has an element bitwidth of 0 instead of 8 to signify "blob" rather than "uint8".

@paleolimbot paleolimbot marked this pull request as ready for review September 30, 2024 22:20
@paleolimbot
Copy link
Member Author

@WillAyd I'm happy to take another pass here if you feel strongly about the names here (otherwise, I'm hoping to merge this to focus on the two remaining release blockers!)

@WillAyd
Copy link
Contributor

WillAyd commented Oct 1, 2024

lgtm - merge away!

@paleolimbot paleolimbot merged commit e52ff0d into apache:main Oct 1, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants