Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AVRO-4081][C++] Add big decimal support and update documentation #3148

Merged
merged 1 commit into from
Dec 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions doc/content/en/docs/++version++/Specification/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -787,6 +787,8 @@ A logical type is always serialized using its underlying Avro type so that value
Language implementations must ignore unknown logical types when reading, and should use the underlying Avro type. If a logical type is invalid, for example a decimal with scale greater than its precision, then implementations should ignore the logical type and use the underlying Avro type.

### Decimal

#### Fixed precision
The `decimal` logical type represents an arbitrary-precision signed decimal number of the form _unscaled × 10<sup>-scale</sup>_.

A `decimal` logical type annotates Avro _bytes_ or _fixed_ types. The byte array must contain the two's-complement representation of the unscaled integer value in big-endian byte order. The scale is fixed, and is specified using an attribute.
Expand All @@ -810,19 +812,19 @@ Scale must be zero or a positive integer less than or equal to the precision.

For the purposes of schema resolution, two schemas that are `decimal` logical types _match_ if their scales and precisions match.

**alternative**
#### Scalable precision

As it's not always possible to fix scale and precision in advance for a decimal field, `big-decimal` is another `decimal` logical type restrict to Avro _bytes_.

_Currently only available in Java and Rust_.
_Currently only available in C++, Java and Rust_.

```json
{
"type": "bytes",
"logicalType": "big-decimal"
}
```
Here, as scale property is stored in value itself it needs more bytes than preceding `decimal` type, but it allows more flexibility.
Here, bytes array contains two serialized properties. First part is an Avro byte arrays which is the two's-complement representation of the unscaled integer value in big-endian byte order. The second part is the scale property stored as an Avro integer. Scale must be zero or a positive integer less than or equal to the precision. Value itself needs more bytes than preceding `decimal` type, but it allows more flexibility.

### UUID

Expand Down
6 changes: 5 additions & 1 deletion lang/c++/impl/Compiler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,11 @@ static LogicalType makeLogicalType(const Entity &e, const Object &m) {
}

LogicalType::Type t = LogicalType::NONE;
if (typeField == "date")
if (typeField == "big-decimal"
&& !containsField(m, "precision")
&& !containsField(m, "scale"))
t = LogicalType::BIG_DECIMAL;
else if (typeField == "date")
t = LogicalType::DATE;
else if (typeField == "time-millis")
t = LogicalType::TIME_MILLIS;
Expand Down
3 changes: 3 additions & 0 deletions lang/c++/impl/LogicalType.cc
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ void LogicalType::setScale(int32_t scale) {
void LogicalType::printJson(std::ostream &os) const {
switch (type_) {
case LogicalType::NONE: break;
case LogicalType::BIG_DECIMAL:
os << R"("logicalType": "big-decimal")";
break;
case LogicalType::DECIMAL:
os << R"("logicalType": "decimal")";
os << ", \"precision\": " << precision_;
Expand Down
7 changes: 7 additions & 0 deletions lang/c++/impl/Node.cc
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,13 @@ void Node::setLogicalType(LogicalType logicalType) {
// Check that the logical type is applicable to the node type.
switch (logicalType.type()) {
case LogicalType::NONE: break;
case LogicalType::BIG_DECIMAL: {
if (type_ != AVRO_BYTES) {
throw Exception("BIG_DECIMAL logical type can annotate "
"only BYTES type");
}
break;
}
case LogicalType::DECIMAL: {
if (type_ != AVRO_BYTES && type_ != AVRO_FIXED) {
throw Exception("DECIMAL logical type can annotate "
Expand Down
1 change: 1 addition & 0 deletions lang/c++/include/avro/LogicalType.hh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ class AVRO_DECL LogicalType {
public:
enum Type {
NONE,
BIG_DECIMAL,
DECIMAL,
DATE,
TIME_MILLIS,
Expand Down
22 changes: 19 additions & 3 deletions lang/c++/test/SchemaTests.cc
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ const char *roundTripSchemas[] = {
R"({"type":"fixed","name":"Test","size":1})",

// Logical types
R"({"type":"bytes","logicalType":"big-decimal"})",
R"({"type":"bytes","logicalType":"decimal","precision":12,"scale":6})",
R"({"type":"fixed","name":"test","size":16,"logicalType":"decimal","precision":38,"scale":9})",
R"({"type":"fixed","name":"test","size":129,"logicalType":"decimal","precision":310,"scale":155})",
Expand Down Expand Up @@ -361,6 +362,7 @@ const char *roundTripSchemas[] = {

const char *malformedLogicalTypes[] = {
// Wrong base type.
R"({"type":"long","logicalType": "big-decimal"})",
R"({"type":"long","logicalType": "decimal","precision": 10})",
R"({"type":"string","logicalType":"date"})",
R"({"type":"string","logicalType":"time-millis"})",
Expand All @@ -379,9 +381,12 @@ const char *malformedLogicalTypes[] = {
R"({"type":"fixed","logicalType":"decimal","size":4,"name":"a","precision":20})",
R"({"type":"fixed","logicalType":"decimal","size":129,"name":"a","precision":311})",
// Scale is larger than precision.
R"({"type":"bytes","logicalType":"decimal","precision":5,"scale":10})"
};

R"({"type":"bytes","logicalType":"decimal","precision":5,"scale":10})",
// Precision is not supported by the big-decimal logical type
// and scale is integrated in bytes.
R"({"type":"bytes","logicalType": "big-decimal","precision": 9})",
R"({"type":"bytes","logicalType": "big-decimal","scale": 2})",
R"({"type":"bytes","logicalType": "big-decimal","precision": 9,"scale": 2})"};
const char *schemasToCompact[] = {
// Schema without any whitespace
R"({"type":"record","name":"Test","fields":[]})",
Expand Down Expand Up @@ -469,6 +474,10 @@ static void testCompactSchemas() {
}

static void testLogicalTypes() {
const char *bytesBigDecimalType = R"({
"type": "bytes",
"logicalType": "big-decimal"
})";
const char *bytesDecimalType = R"({
"type": "bytes",
"logicalType": "decimal",
Expand Down Expand Up @@ -496,6 +505,13 @@ static void testLogicalTypes() {
const char *uuidType = R"({"type": "string","logicalType": "uuid"})";
// AVRO-2923 Union with LogicalType
const char *unionType = R"([{"type":"string", "logicalType":"uuid"},"null"]})";
{
BOOST_TEST_CHECKPOINT(bytesBigDecimalType);
ValidSchema schema = compileJsonSchemaFromString(bytesBigDecimalType);
BOOST_CHECK(schema.root()->type() == AVRO_BYTES);
LogicalType logicalType = schema.root()->logicalType();
BOOST_CHECK(logicalType.type() == LogicalType::BIG_DECIMAL);
}
{
BOOST_TEST_CHECKPOINT(bytesDecimalType);
ValidSchema schema1 = compileJsonSchemaFromString(bytesDecimalType);
Expand Down