Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow-array ffi: FFI_ArrowArray.null_count is always interpreted as unsigned and initialized during conversion from C to Rust. #6497

Closed
pianoyeg94 opened this issue Oct 2, 2024 · 3 comments · Fixed by #6674
Assignees
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers help wanted

Comments

@pianoyeg94
Copy link

pianoyeg94 commented Oct 2, 2024

Describe the bug
The Arrow C data interface spec states, that the null_count field of an ArrowArray structure instance
"MAY be -1 if not yet computed" .
Currently the arrow-array crate's ffi always treats this field as unsigned and initialized, though this assumption can be
false, if the arrow C data interface specification is strictly followed. Because of that one can get all sorts of nasty bugs
when working with arrays coming from C.

To Reproduce

  1. Construct an instance of an FFI_ArrowArray and set its null_count field to -1.
  2. Convert the above instance into an ArrayData instance with the help of the from_ffi
    (or from_ffi_and_data_type) function.
  3. Call the null_count() method on the resulting ArrayData instance (calls into NullBuffer's null_count()).
  4. The result is a usize::MAX due to conversion from signed to unsigned.

Expected behavior
If the null_count field of an FFI_ArrowArray instance is -1 (uninitialized):
- Either initialize the null_count field of a NullBuffer instance during ffi conversion by inspecting the null buffer itself,
- or stick to the arrow C data interface spec, by making the null_count field of theNullBuffer an Option and adding
support for lazy initialization during the call to its null_count() method.

@pianoyeg94 pianoyeg94 added the bug label Oct 2, 2024
@tustvold
Copy link
Contributor

tustvold commented Oct 2, 2024

Computing the null count if not provided makes sense to me, various kernels rely on this being precomputed to efficiently perform kernel selection and so changing this would be highly disruptive

@adbmal
Copy link
Contributor

adbmal commented Oct 29, 2024

take

@alamb
Copy link
Contributor

alamb commented Nov 16, 2024

label_issue.py automatically added labels {'arrow'} from #6674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants