GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations #43957

zeroshade · 2024-09-04T19:40:18Z

Rationale for this change

Widening the Decimal128/256 type to allow for bitwidths of 32 and 64 allows for more interoperability with other libraries and utilities which already support these types. This provides even more opportunities for zero-copy interactions between things such as libcudf and various databases.

What changes are included in this PR?

This PR contains the basic C++ implementations for Decimal32/Decimal64 types, arrays, builders and scalars. It also includes the minimum necessary to get everything compiling and tests passing without also extending the acero kernels and parquet handling (both of which will be handled in follow-up PRs).

Are these changes tested?

Yes, tests were extended where applicable to add decimal32/decimal64 cases.

Are there any user-facing changes?

Currently if a user is using decimal(precision, scale) rather than decimal128(precision, scale) they will get a Decimal128Type if the precision is <= 38 (max precision for Decimal128) and Decimal256Type if the precision is higher. Following the same pattern, this change means that using decimal(precision, scale) instead of the specific decimal32/decimal64/decimal128/decimal256 functions results in the following functionality:

for precisions [1 : 9] => Decimal32Type
for precisions [10 : 18] => Decimal64Type
for precisions [19 : 38] => Decimal128Type
for precisions [39 : 76] => Decimal256Type

While many of our tests currently make the assumption that decimal with a low precision would be Decimal128 and had to be updated, this may cause an initial surprise if users are making the same assumptions.

GitHub Issue: [Format] Add Decimal32 and Decimal64 to Arrow #43956

github-actions · 2024-09-04T19:40:45Z

⚠️ GitHub issue #43956 has been automatically assigned in GitHub to PR creator.

cpp/src/arrow/type.cc

cpp/src/arrow/array/builder_dict.h

cpp/src/arrow/compute/kernels/codegen_internal.h

lidavidm · 2024-09-05T00:37:17Z

cpp/src/arrow/testing/gtest_util.h

@@ -171,7 +171,8 @@ using PrimitiveArrowTypes =
 using TemporalArrowTypes =
    ::testing::Types<Date32Type, Date64Type, TimestampType, Time32Type, Time64Type>;

-using DecimalArrowTypes = ::testing::Types<Decimal128Type, Decimal256Type>;
+using DecimalArrowTypes =
+    ::testing::Types</*Decimal32Type, Decimal64Type,*/ Decimal128Type, Decimal256Type>;


Ditto here. (Should we file issues to come back to these?)

These are commented out because we didn't implement casting for the new decimal types. This is mentioned in the issue as check boxes to do rather than as an entirely separate issue currently.

But it's going to be a separate PR, right?

yes, i didn't want to make this already large PR even larger. I'll implement the cast kernels and so on as a follow-up PR

pitrou · 2024-09-05T10:01:26Z

Following the same pattern, this change means that using decimal(precision, scale) instead of the specific decimal32/decimal64/decimal128/decimal256 functions results in the following functionality

I'm afraid this may massively break user code. I would suggest another approach:

deprecate the decimal() factory while keeping its current behavior of always returning at least decimal128
introduce a new smallest_decimal() factory that is documented to return the smallest possible type, and explicitly makes no guarantees about the stability of the return type

cpp/src/arrow/type.cc

wgtmac · 2024-09-05T15:26:12Z

Following the same pattern, this change means that using decimal(precision, scale) instead of the specific decimal32/decimal64/decimal128/decimal256 functions results in the following functionality

I'm afraid this may massively break user code. I would suggest another approach:

deprecate the decimal() factory while keeping its current behavior of always returning at least decimal128

introduce a new smallest_decimal() factory that is documented to return the smallest possible type, and explicitly makes no guarantees about the stability of the return type

I just have the same concern. +1 on the proposed workaround.

zeroshade · 2024-09-05T17:51:37Z

@pitrou @bkietz @wgtmac I've updated this based on the suggestion, created a smallest_decimal function and added a deprecated message to the docstring for decimal.

Co-authored-by: Antoine Pitrou <[email protected]>

conbench-apache-arrow · 2024-10-01T04:52:09Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit d55d4c6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 27 possible false positives for unstable benchmarks that are known to sometimes produce them.

pitrou · 2024-10-01T08:14:09Z

Hmm, did you notice the UBSAN failure in Decimal32Test.LeftShift?
https://github.com/apache/arrow/actions/runs/11115928849/job/30885255275#step:6:6273

(you can easily run this build locally using archery docker if you don't want to wait for CI every time :-))

zeroshade requested review from felipecrv, lidavidm, bkietz, pitrou and joellubi September 4, 2024 19:40

zeroshade requested review from wgtmac and westonpace as code owners September 4, 2024 19:40

github-actions bot added Component: Parquet Component: C++ awaiting committer review Awaiting committer review Component: Documentation Component: Python labels Sep 4, 2024

lidavidm reviewed Sep 5, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Sep 5, 2024

wgtmac reviewed Sep 5, 2024

View reviewed changes

cpp/src/arrow/type.cc Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Sep 5, 2024

zeroshade and others added 17 commits September 30, 2024 11:27

updates from feedback and comments

a8c0c75

remove commented out code

46c84f1

ran pre-commit for linting

4973864

sumtype and accumulator type for decimals should be consistent

b7d2bf3

simplify check

512d464

linting

8f46e50

Add tests for Decimal32 and Decimal64

ae3d8a2

linting

bf9eb74

remove abs from FromReal, only constexpr in C++23 and newer

2fb271a

simplify a bunch of tests with a generic typed_test

b44888d

use FromRealApprox

e2957a9

static_cast instead of implicit cast

980d6fb

remove special cases, adjust tests

2a3e5c4

Update cpp/src/arrow/util/decimal.cc

c723754

Co-authored-by: Antoine Pitrou <[email protected]>

more updates from comments

5382eb4

add reference to issue for decimal32 approx

9fda783

make RoundedRightShift a no-op

af8c722

zeroshade force-pushed the cpp-decimal32-64 branch from 154ea65 to af8c722 Compare September 30, 2024 15:27

github-actions bot removed Component: Java Component: C# labels Sep 30, 2024

zeroshade added 4 commits September 30, 2024 12:32

fix tests

48639e3

avoid ASAN issue

b110605

fix ubsan test

1d97e27

fix ubsan

39032f2

zeroshade merged commit d55d4c6 into apache:main Sep 30, 2024
38 of 41 checks passed

zeroshade deleted the cpp-decimal32-64 branch September 30, 2024 21:15

kou mentioned this pull request Oct 1, 2024

[C++] Decimal32 support introduced an UBSAN error #44276

Closed

mapleFU mentioned this pull request Oct 9, 2024

[C++][Parquet] arrow Decimal32/Decimal64 write Parquet and testing #44345

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations #43957

GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations #43957

zeroshade commented Sep 4, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Sep 4, 2024

lidavidm Sep 5, 2024

zeroshade Sep 5, 2024

pitrou Sep 16, 2024

zeroshade Sep 16, 2024

pitrou commented Sep 5, 2024

wgtmac commented Sep 5, 2024

zeroshade commented Sep 5, 2024

conbench-apache-arrow bot commented Oct 1, 2024

pitrou commented Oct 1, 2024

GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations #43957

GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations #43957

Conversation

zeroshade commented Sep 4, 2024 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Sep 4, 2024

lidavidm Sep 5, 2024

Choose a reason for hiding this comment

zeroshade Sep 5, 2024

Choose a reason for hiding this comment

pitrou Sep 16, 2024

Choose a reason for hiding this comment

zeroshade Sep 16, 2024

Choose a reason for hiding this comment

pitrou commented Sep 5, 2024

wgtmac commented Sep 5, 2024

zeroshade commented Sep 5, 2024

conbench-apache-arrow bot commented Oct 1, 2024

pitrou commented Oct 1, 2024

zeroshade commented Sep 4, 2024 •

edited by github-actions bot

Loading