-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support list_view_array & large_list_view layout and basic construction #5576
Conversation
095a678
to
677bd2e
Compare
ad1c69b
to
d8aba1b
Compare
arrow-buffer/src/buffer/size.rs
Outdated
use std::ops::Deref; | ||
|
||
#[derive(Debug, Clone)] | ||
pub struct SizeBuffer<O: ArrowNativeType>(ScalarBuffer<O>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this really adds anything over ScalarBuffer?
let value_offsets = unsafe { get_offsets(&data) }; | ||
let value_sizes = unsafe { get_sizes(&data) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsound as the OffsetBuffer constraints are not applicable to view types
assert_eq!(list_array.len(), 0) | ||
} | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we possibly use the constructors in these tests instead of ArrayData, this should be much easier to follow and maintain
DataType::ListView | ||
}; | ||
|
||
/// Returns the data type of the list view array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to be incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this, I only took a quick look as I am very far behind on reviews from being away and then sick, but this mostly looks good. I think the major changes needed are to remove the uses of OffsetBuffer
and probably remove SizeBuffer
as well, and just use ScalarBuffer
.
@tustvold 🚀 Thank you very much for your review, I will make the modifications based on the comments later this week. |
e26ac34
to
955eb2b
Compare
@tustvold If you have some free time, perhaps you could help review it again. 😄 |
It is on my list but I probably won't get to it until Friday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow review, I am a little concerned that this PR is very large and appears to copy-pasta a lot of the ListArray code incorrectly.
Perhaps we might split this up into smaller pieces, something like:
- Add basic ArrayData construction and validation
- Add ArrayData equality
- Add GenericListViewArray
- Add GenericListViewBuilder
As it stands it is quite hard to guage what is tested and what isn't, and there are definitely a number of code paths that are simply incorrect
))); | ||
} | ||
} | ||
if len != sizes.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test should be moved ahead of the iteration above
/// Returns ith value of this list view array. | ||
/// # Safety | ||
/// Caller must ensure that the index is within the array bounds | ||
pub unsafe fn value_unchecked(&self, i: usize) -> ArrayRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does not appear to be correct (and could probably do with a test)
} | ||
|
||
fn is_empty(&self) -> bool { | ||
self.value_offsets.len() <= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not appear to be correct
/// | ||
/// Panics if the length of [`Self::values`] exceeds `OffsetSize::MAX` | ||
#[inline] | ||
pub fn append(&mut self, is_valid: bool, size: usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size
should be inferred from the length of the values_builder, much like it is for GenericListBuilder
/// Create a new [`ScalarBuffer`] containing a single 0 value | ||
pub fn new_empty() -> Self { | ||
let buffer = MutableBuffer::from_len_zeroed(std::mem::size_of::<T>()); | ||
Self::from(buffer.into_buffer()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is not necessary, and is incorrect as it creates a ScalarBuffer
that isn't empty
arrow-data/src/data.rs
Outdated
/// Returns a reference to the data in `buffer` as a typed slice | ||
/// after validating. The returned slice is guaranteed to have at | ||
/// least `len` entries. | ||
fn typed_sizes<T: ArrowNativeType + num::Num>(&self) -> Result<&[T], ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method isn't necessary
arrow-data/src/data.rs
Outdated
@@ -937,11 +972,23 @@ impl ArrayData { | |||
self.validate_offsets::<i32>(values_data.len)?; | |||
Ok(()) | |||
} | |||
DataType::ListView(field) => { | |||
let values_data = self.get_single_valid_child_data(field.data_type())?; | |||
self.validate_offsets::<i32>(values_data.len)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validates the offsets w.r.t to the ListArray semantics, this is incorrect
@@ -929,6 +944,26 @@ impl ArrayData { | |||
Ok(()) | |||
} | |||
|
|||
fn validate_sizes<T: ArrowNativeType + num::Num + std::fmt::Display>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be incorrect
@tustvold Sounds good. I'll split it into several smaller pull requests separately. I'll ping you again once I'm ready. |
Which issue does this PR close?
Closes #5501 .
Rationale for this change
What changes are included in this PR?
Add the basic implementation of ListView and LargeListView, as well as unit testing.
Are there any user-facing changes?