Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-4024: [Rust] support nan/inf/-inf as float/double default #3051

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

xxchan
Copy link
Contributor

@xxchan xxchan commented Jul 27, 2024

What is the purpose of the change

As title

[AVRO-4024] [Rust]: Fix default value support for double and float as "NaN", "Infinity" and "-Infinity" - ASF JIRA

Verifying this change

CI must pass

Documentation

  • Does this pull request introduce a new feature? no

@github-actions github-actions bot added the Rust label Jul 27, 2024
@KalleOlaviNiemitalo
Copy link
Contributor

Does this pull request introduce a new feature? no

The "field default values" table at https://avro.apache.org/docs/1.11.1/specification/#schema-record does not mention that the default value for the "float,double" Avro types could be specified as a JSON string, so, this seems a new feature to me.

@xxchan
Copy link
Contributor Author

xxchan commented Jul 27, 2024

Thanks for your reply

  1. This behavior has existed in java (the de-facto standard) for >10 years, and in the real world such schema do exists. So I don't think there are any practical reason why the Rust implementation should reject it.
    < FLOATING_POINT_LITERAL:
    ("-")?
    ( "NaN" | "Infinity" |
    <DECIMAL_FLOATING_POINT_LITERAL> | <HEXADECIMAL_FLOATING_POINT_LITERAL> )
    >
    (AVRO-972)
  2. Perhaps we can "guess the intention" of the spec: First, float/double are defined as IEEE 754 floating-point number, and NaN/Inf should be valid values; Then JSON encoding and the default value encoding are the same; JSON encoding should have the same position and allow all valid values.

Therefore, I would say this is not a "new feature", but it's vagueness in the spec. We should fix the spec instead of the Rust implementation. I just created a separate Jira ticket to update the spec https://issues.apache.org/jira/browse/AVRO-4025

@KalleOlaviNiemitalo
Copy link
Contributor

The code you linked to is in the Java implementation of Avro IDL. The Java code that parses Avro schemas from JSON does not allow "default": "NaN" for a floating-point field: https://github.com/apache/avro/blob/42c54e68120bb5c2393954b5990a0aa3b3960510/lang/java/avro/src/main/java/org/apache/avro/Schema.java

For Avro IDL, the Java implementation seems to expect float val = NaN; without quotation marks, rather than float val = "NaN";. Also, it allows hexadecimal floating-point literals, hexadecimal integer literals, and octal integer literals, even though those are not valid in JSON.

So, this part of the Avro IDL specification at https://avro.apache.org/docs/1.11.1/idl-language/#default-values

Default values for fields may be optionally specified by using an equals sign after the field name followed by a JSON expression indicating the default value. This JSON is interpreted as described in the spec.

fails to describe the de-facto support for those non-JSON syntaxes.

Now if you change the Avro specification to allow "NaN", "Infinity", and "-Infinity" as default values in JSON, I think the Avro IDL specification must also be changed to state that those quoted forms are not valid in Avro IDL and that the unquoted literals must be used instead.

@KalleOlaviNiemitalo
Copy link
Contributor

More specific link:

case FLOAT:
case DOUBLE:
return defaultValue.isNumber();

@xxchan
Copy link
Contributor Author

xxchan commented Jul 28, 2024

Thanks for these information! This problem seems more complex than I expected. Let me do more research on it.

@xxchan
Copy link
Contributor Author

xxchan commented Jul 29, 2024

@KalleOlaviNiemitalo I've tested with the java lib. And "NaN" is indeed allowed, but not NaN

    final String schemaString = "{\"type\" : \"record\", \"name\" : \"R\",\n"
        + "  \"fields\" : [ \n   {\"name\" : \"f\", \"type\" : \"float\", \"default\": \"NaN\"}]}";
    final Schema schema = new Schema.Parser().parse(schemaString);

Note that this is handled inside jackason:

  • "NaN" is parsed into a DoubleNode, and isNumber returns true in isValidDefault you linked.
  • On the contrary, NaN will returns exception:
org.apache.avro.SchemaParseException: com.fasterxml.jackson.core.JsonParseException: Non-standard token 'NaN': enable `JsonReadFeature.ALLOW_NON_NUMERIC_NUMBERS` to allow

Anyway, thanks for pointing out that Avro IDL is a different thing. I should have tested with the java lib directly at the beginning :)

@martin-g
Copy link
Member

martin-g commented Aug 5, 2024

@xxchan Did you intentionally disable "Maintainers are allowed to edit this pull request." ?
I'd like to make some improvements to your code.

@xxchan
Copy link
Contributor Author

xxchan commented Aug 6, 2024

@martin-g It seems that if a PR is created from a fork living under an organisation (rather than an individual) it's not possible to allow edits by maintainers: https://github.com/orgs/community/discussions/5634 🤦

So I just invited you to our fork, so that you can commit directly.

Thank you!

@martin-g
Copy link
Member

martin-g commented Aug 6, 2024

I will do the improvements in a follow up!

@martin-g
Copy link
Member

martin-g commented Aug 6, 2024

@xxchan #3066

/// So they are represented in JSON as strings.
fn parse_special_float(s: &str) -> Option<f32> {
match s.trim().to_ascii_lowercase().as_str() {
"nan" | "+nan" | "-nan" => Some(f32::NAN),
Copy link
Member

@martin-g martin-g Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java and C++ use case-sensitive values.
I suggest to compare against NaN, Inf, Infinity, etc., i.e. let's remove to_ascii_lowercase()

@martin-g
Copy link
Member

martin-g commented Aug 7, 2024

I am going to merge this PR and then adapt it to behave as the ones for Java and C#

@martin-g martin-g merged commit 7701526 into apache:main Aug 7, 2024
15 checks passed
martin-g pushed a commit that referenced this pull request Aug 7, 2024
Signed-off-by: xxchan <[email protected]>
Co-authored-by: Xiangjin <[email protected]>
(cherry picked from commit 7701526)
@xxchan xxchan deleted the xxchan/patch-1 branch September 20, 2024 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants