Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR]: Rename get_arrayref_at_indices to take_arrays #12654

Merged
merged 1 commit into from
Sep 28, 2024

Conversation

akurmustafa
Copy link
Contributor

@akurmustafa akurmustafa commented Sep 27, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

As in the suggestion, this PR renames the util function get_arrayref_at_indices to take_arrays.

Also, type of the indices parameter is changed from &PrimitiveArray<UInt32Type> to &dyn Array to make it similar to compute::take's API.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions common Related to common crate functions labels Sep 27, 2024
arrays: &[ArrayRef],
indices: &PrimitiveArray<UInt32Type>,
) -> Result<Vec<ArrayRef>> {
pub fn take_arrays(arrays: &[ArrayRef], indices: &dyn Array) -> Result<Vec<ArrayRef>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering how it differs from arrow::compute::take, looks like the same but for 2dim arrays instead of a single array.

What I'm thinking should we move this method to arrow-rs kernel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, this util is just to abstract away same outer iteration for the compute::take. If community thinks, this is beneficial, we can move this to arrow-rs. I think, this pattern is common enough to move it to arrow-rs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to move to arrow. And maybe add a not int he docs for RecordBatch pointing people at it

It is common enough that it would be nice not to have to write it in each downstream crate

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @akurmustafa
I feel we can move it to arrow-rs as for me it is natural API evolvement on top of take which is used by downstream projects.

Another thing came to my mind we probably want to optimize take as it likely a bottleneck for SMJ we found in apache/datafusion-comet#901 (comment)

@alamb
Copy link
Contributor

alamb commented Sep 27, 2024

thanks @akurmustafa I feel we can move it to arrow-rs as for me it is natural API evolvement on top of take which is used by downstream projects.

Another thing came to my mind we probably want to optimize take as it likely a bottleneck for SMJ we found in apache/datafusion-comet#901 (comment)

Figuring out how to make take faster would certainly be a good idea, though I think that will require some significant effort. There is not a lot of "low hanging fruit" in that kernel that I know of

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akurmustafa and @comphead

@comphead comphead merged commit 689500f into apache:main Sep 28, 2024
26 checks passed
@akurmustafa
Copy link
Contributor Author

By the way, I opened a PR to move this util to arrow-rs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate functions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants