Make `StringViewArray::slice()` and `BinaryViewArray::slice()` faster / non allocating #6408

alamb · 2024-09-17T10:57:11Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working on an upstream project apache/datafusion#12092 which switched DataFusion to use StringViewArray rather than StringArray

When I did, one of the queries got much slower.

Profiling revealed that the time difference was almost entirely explained by the time spent in StringViewArray::slice()

Here is the flamegraph with StringArray:

Here is the same query with StringViewArray:

I am pretty sure the additional time is due to the time spent allocating / copying / deallocating the Vecs of buffers here:

arrow-rs/arrow-array/src/array/byte_view_array.rs

Line 118 in 341ec35

buffers: Vec<Buffer>,

Where calling slice on a StringArray can be done with a few Arc increments.

arrow-rs/arrow-array/src/array/byte_array.rs

Lines 88 to 91 in 3490639

    
           data_type: DataType, 
        
           value_offsets: OffsetBuffer<T::Offset>, 
        
           value_data: Buffer, 
        
           nulls: Option<NullBuffer>,

Describe the solution you'd like
I would like StringViewArray::slice to be faster (aka don't allocate)

Describe alternatives you've considered

We can (and probably should) change DataFusion not to use slice in this case (see apache/datafusion#6906, apache/datafusion#6906 (comment) specifically) but I think making slice faster / non allocating for StringViewArray will be useful in general

Additional context

The text was updated successfully, but these errors were encountered:

alamb · 2024-09-17T10:57:26Z

fyi @a10y and @XiangpengHao

alamb · 2024-09-17T11:04:40Z

On solution here would be to replace Vec<Buffer> with Arc<[Buffer]> after construction -- that would make managing the buffers on slice extremely cheap

XiangpengHao · 2024-09-17T14:32:00Z

On solution here would be to replace Vec with Arc<[Buffer]> after construction

I agree, this should fix the problems. In fact, many other operations (e.g., take) also need to clone the Vec, changing to Arc<[Buffer]> will benefit them as well.

However, I would be a bit more careful about why cloning the buffer taking so long -- often indicating the Vec is large, which often means gc is not being called timely. So in addition to changing to Arc<[Buffer]>, I would also examine the plan to make sure gc (in CoalesceBatchesExec) is being invoked properly. cc @WetABQ who might be interested in this discussion.

I plan to work on this later this week but anyone else feel free to run faster than me!

XiangpengHao · 2024-09-17T14:33:38Z

replace Vec with Arc<[Buffer]> after construction -- that would make managing the buffers on slice extremely cheap

But that also means an extra indirection to read buffers, not sure if it matters

Rachelint · 2024-09-17T14:47:56Z

The slice maybe can't be eliminated until the epic apache/datafusion#7065 is finished...
Because it will still be called here https://github.com/apache/datafusion/blob/a08f923c2acb1a46614970231d9a672c36ce3ad2/datafusion/physical-plan/src/aggregates/row_hash.rs#L713

alamb · 2024-09-17T14:54:50Z

But that also means an extra indirection to read buffers, not sure if it matters

I think an Arc<Vec<Buffer>> would add another indirection. I was thinking/hoping that Arc<[Buffer]> would only still have one indirection

alamb · 2024-09-17T14:57:11Z

The slice maybe can't be eliminated until the epic apache/datafusion#7065 is finished...

FWIW the slice that I looked at in apache/datafusion#12092 (comment) is a different one:

https://github.com/apache/datafusion/blob/a08f923c2acb1a46614970231d9a672c36ce3ad2/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs#L435-L438

(This is called once for each distinct group in each batch being aggregates, which is quite bad -- the better way to solve this is to implement a Min/Max accumulator for strings that avoids slicing at all, which we are tracking in apache/datafusion#6906)

I think the fact that slice is used many different places makes it all the more important to optimize in arrow-rs

alamb · 2024-09-17T14:57:54Z

However, I would be a bit more careful about why cloning the buffer taking so long -- often indicating the Vec is large, which often means gc is not being called timely

This is a good point

Rachelint · 2024-09-17T15:08:59Z

The slice maybe can't be eliminated until the epic apache/datafusion#7065 is finished...

FWIW the slice that I looked at in apache/datafusion#12092 (comment) is a different one:

https://github.com/apache/datafusion/blob/a08f923c2acb1a46614970231d9a672c36ce3ad2/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs#L435-L438

(This is called once for each distinct group in each batch being aggregates, which is quite bad -- the better way to solve this is to implement a Min/Max accumulator for strings that avoids slicing at all, which we are tracking in apache/datafusion#6906)

I think the fact that slice is used many different places makes it all the more important to optimize in arrow-rs

Yes, maybe it should be ensured to be a cheap operations, it is used in many many places...

ShiKaiWi · 2024-09-20T15:46:47Z

@alamb @XiangpengHao Could assign this ticket to me? I guess I can make it after understanding the proposal in the discussion.

BTW, I find there is no benchmark case for the slice method, and I guess the benchmark case is necessary to prove the proposal implementation.

XiangpengHao · 2024-09-20T22:06:58Z

You can self assign by commenting "take"

I think adding benchmark is the right first step! Looking forward to it!

alamb · 2024-09-20T22:32:23Z

Tahank you @ShiKaiWi -- can't wait to check it out

alamb added the enhancement Any new improvement worthy of a entry in the changelog label Sep 17, 2024

alamb changed the title ~~Make StringViewArray::slice() and BinaryViewArray::slice() faster~~ Make StringViewArray::slice() and BinaryViewArray::slice() faster / non allocating Sep 17, 2024

alamb mentioned this issue Sep 17, 2024

[Epic] Complete Initial StringView in DataFusion apache/datafusion#11752

Closed

21 tasks

This was referenced Sep 17, 2024

Implement fast min/max accumulator for binary / strings (now it uses the slower path) apache/datafusion#6906

Closed

Enable reading StringViewArray by default from Parquet apache/datafusion#12092

Closed

alamb added arrow Changes to the arrow crate help wanted labels Sep 17, 2024

ShiKaiWi linked a pull request Sep 20, 2024 that will close this issue

Use Arc<[Buffer]> instead of raw Vec<Buffer> in GenericByteViewArray for faster slice #6427

Open

a10y mentioned this issue Sep 20, 2024

feat: update IPC format to hold buffer_index spiraldb/vortex#903

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `StringViewArray::slice()` and `BinaryViewArray::slice()` faster / non allocating #6408

Make `StringViewArray::slice()` and `BinaryViewArray::slice()` faster / non allocating #6408

alamb commented Sep 17, 2024 •

edited

Loading

alamb commented Sep 17, 2024

alamb commented Sep 17, 2024 •

edited

Loading

XiangpengHao commented Sep 17, 2024

XiangpengHao commented Sep 17, 2024

Rachelint commented Sep 17, 2024 •

edited

Loading

alamb commented Sep 17, 2024

alamb commented Sep 17, 2024

alamb commented Sep 17, 2024

Rachelint commented Sep 17, 2024

ShiKaiWi commented Sep 20, 2024 •

edited

Loading

XiangpengHao commented Sep 20, 2024

alamb commented Sep 20, 2024

Make StringViewArray::slice() and BinaryViewArray::slice() faster / non allocating #6408

Make StringViewArray::slice() and BinaryViewArray::slice() faster / non allocating #6408

Comments

alamb commented Sep 17, 2024 • edited Loading

alamb commented Sep 17, 2024

alamb commented Sep 17, 2024 • edited Loading

XiangpengHao commented Sep 17, 2024

XiangpengHao commented Sep 17, 2024

Rachelint commented Sep 17, 2024 • edited Loading

alamb commented Sep 17, 2024

alamb commented Sep 17, 2024

alamb commented Sep 17, 2024

Rachelint commented Sep 17, 2024

ShiKaiWi commented Sep 20, 2024 • edited Loading

XiangpengHao commented Sep 20, 2024

alamb commented Sep 20, 2024

Make `StringViewArray::slice()` and `BinaryViewArray::slice()` faster / non allocating #6408

Make `StringViewArray::slice()` and `BinaryViewArray::slice()` faster / non allocating #6408

alamb commented Sep 17, 2024 •

edited

Loading

alamb commented Sep 17, 2024 •

edited

Loading

Rachelint commented Sep 17, 2024 •

edited

Loading

ShiKaiWi commented Sep 20, 2024 •

edited

Loading