Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) #12675

simonvandel · 2024-09-29T20:46:43Z

Which issue does this PR close?

Closes #.

Rationale for this change

It is generally faster to make a big allocation up front, rather than making many small allocations.

What changes are included in this PR?

Add a benchmark
Refactor to reduce code duplication
Change decoding methods to write into pre-allocated buffer

Are these changes tested?

Relying on existing SQL tests.

Are there any user-facing changes?

Yes, base64 and hex decoding is ~2x faster:

base64_decode/1024      time:   [26.362 µs 26.459 µs 26.597 µs]
                        change: [-48.501% -47.320% -45.897%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

hex_decode/1024         time:   [92.789 µs 92.904 µs 93.035 µs]
                        change: [-58.973% -58.895% -58.816%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

base64_decode/4096      time:   [100.23 µs 100.30 µs 100.38 µs]
                        change: [-62.678% -62.609% -62.545%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  1 (1.00%) high mild
  1 (1.00%) high severe

hex_decode/4096         time:   [373.11 µs 377.63 µs 383.37 µs]
                        change: [-58.265% -58.081% -57.870%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  6 (6.00%) high severe

base64_decode/8192      time:   [205.00 µs 205.26 µs 205.62 µs]
                        change: [-62.173% -61.549% -60.938%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

hex_decode/8192         time:   [735.37 µs 737.54 µs 740.51 µs]
                        change: [-59.434% -59.332% -59.214%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

alamb

Looks like a very nice improvement to me @simonvandel

There is probably additional performance to be had by using unsafe, but this seems like an improvement over the current state to me. We can always optimize it further if/when necessary

alamb · 2024-09-30T19:44:26Z

datafusion/functions/src/encoding/inner.rs

+where
+    F: Fn(&[u8], &mut [u8]) -> Result<usize>,
+{
+    let mut values = vec![0; conservative_upper_bound_size];


I think you could potentially call Vec::with_capacity rather than having to clear it all and then truncate at the end

I don't think using with_capacity is possible here, as we need to be able to give mutable slices out, that the hex/base64 methods can decode into.
Using just with_capacity, the length of the vector would be zero, so we can't mutably slice it.

alamb · 2024-09-30T19:45:14Z

datafusion/functions/src/encoding/inner.rs

+        match self {
+            Self::Base64 => {
+                let upper_bound =
+                    base64::decoded_len_estimate(input_value.values().len());


I double checked and indeed the docs say this is a conservative estimate

simonvandel added 4 commits September 29, 2024 22:28

add bench

1e72b77

replace macro with generic function

8d134e9

remove duplicated code

f501336

optimize base64/hex decode

57c50a8

github-actions bot added the functions label Sep 29, 2024

alamb approved these changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) #12675

Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) #12675

simonvandel commented Sep 29, 2024

alamb left a comment

alamb Sep 30, 2024

simonvandel Oct 1, 2024

alamb Sep 30, 2024

Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) #12675

Are you sure you want to change the base?

Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) #12675

Conversation

simonvandel commented Sep 29, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

alamb Sep 30, 2024

Choose a reason for hiding this comment

simonvandel Oct 1, 2024

Choose a reason for hiding this comment

alamb Sep 30, 2024

Choose a reason for hiding this comment