-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated output formats to output BigDecimal fields as numeric values instead of a byte representation. #1640
base: develop
Are you sure you want to change the base?
Conversation
|
||
// Wrap BigDecimal fields in a wrapper which handles the conversion to String. | ||
if (value instanceof BigDecimal) { | ||
value = new BigDecimalWrapper((BigDecimal) value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wrapper is used because the original JsonWriter class uses toString()
to generate the representation of the Number
instance.
Becase of internal private methods, I can't simply implement this logic directly, so I'm using the wrapper class defined below as a workaround.
* the License. | ||
*/ | ||
|
||
package io.cdap.cdap.format.io; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this in this package so I had access to the underlaying JsonWriter instance.
c30d43c
to
929b962
Compare
* Custom Datum Writer for Structured Records, this class writes BigDecimal values as numbers instead of byte arrays in | ||
* the JSON output. | ||
*/ | ||
public class BigDecimalAwareJsonStructuredRecordDatumWriter extends JsonStructuredRecordDatumWriter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases do we need the original represenatation? Also, I am wondering why is it a byte array given io.cdap.cdap.format.io.JsonStructuredRecordDatumWriter#encode(io.cdap.cdap.common.io.Encoder, io.cdap.cdap.api.data.schema.Schema, java.lang.Object) handles DECIMAL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we pass the value directly from the StructuredRecord. So decimals are simply a byte array internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, why this logic does not go into JsonStructuredRecordDatumWriter itself?
|
||
if (value != null && logicalType == Schema.LogicalType.DECIMAL) { | ||
BigDecimal bdValue = fromObject(value, schema); | ||
getJsonWriter(encoder).value(bdValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a hack. Why can't we do it without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the code as implemented in the platform doesn't have logic to handle BigDecimals directly. Only primitive CDAP types are handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io.cdap.cdap.format.io.JsonStructuredRecordDatumWriter#encode(io.cdap.cdap.common.io.Encoder, io.cdap.cdap.api.data.schema.Schema, java.lang.Object)
has Decimal support. If it does not work, we should fix it over there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it's cdap-formats it's shipped in plugin, so we can have a new plugin released with a cdap-formats fix without new CDAP release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class only handles Decimals if the logicalTypeAsString
flag is set (which we only do for preview). It also writes the value as a String which would wrap the output in quotes
.../src/test/java/io/cdap/plugin/format/BigDecimalAwareStructuredRecordStringConverterTest.java
Show resolved
Hide resolved
private static final StructuredRecord RECORD = StructuredRecord.builder(SCHEMA) | ||
.set("i", 1) | ||
.set("f", 256.1f) | ||
.setDecimal("bd", BD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also test for `.setDecimal(.., null)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
...mmon/src/main/java/io/cdap/plugin/format/BigDecimalAwareStructuredRecordStringConverter.java
Show resolved
Hide resolved
if (fieldSchema.getLogicalType() != null && fieldSchema.getLogicalType() == Schema.LogicalType.DECIMAL) { | ||
BigDecimal decimalValue = record.getDecimal(fieldName); | ||
|
||
// Throw exception if the field is expected tu be decimal, but it could not be processed as such. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: tu
Schema.LogicalType logicalType = nonNullableSchema.getLogicalType(); | ||
|
||
if (value != null && logicalType == Schema.LogicalType.DECIMAL) { | ||
BigDecimal bdValue = fromObject(value, schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: Can we have a BigDecimalWrapper right here? This would remove the need for a special JsonWriter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still to call the writer directly, as the superclass doesn't expose methods to write number instances.
This is the reason I have to use the writer directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why do you use it directly (and that another question above, we should just expose it). My question is if you do wrapping right here, you would not need BigDecimalAwareJsonWriter
and use just vanilla JsonWriter
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the JsonWriter's value(Number number)
method uses the Number class toString()
method internally to generate the representation of the value that it needs to output.
The problem with this is that BigQuery's toString()
method can represent values in scientific notation. In order to avoid this, I create that wrapper which delegates to BigQuery's toPlainString()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. If you pass this Wrapper to vanilla JsonWriter.value(Number)
it would do the job, no? You don't need to create wrapper inside JsonWriter
929b962
to
efa31f4
Compare
…instead of a byte representation.
efa31f4
to
14a4992
Compare
No description provided.