Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Cannot infer common argument type for comparison operation Union..." #10180

Open
samuelcolvin opened this issue Apr 22, 2024 · 7 comments · May be fixed by apache/arrow-rs#6218
Open

"Cannot infer common argument type for comparison operation Union..." #10180

samuelcolvin opened this issue Apr 22, 2024 · 7 comments · May be fixed by apache/arrow-rs#6218
Labels
bug Something isn't working

Comments

@samuelcolvin
Copy link
Contributor

samuelcolvin commented Apr 22, 2024

Describe the bug

See datafusion-contrib/datafusion-functions-json#3

I have a union defined by

        DataType::Union(
            UnionFields::new(
                vec![0, 1, 2, 3, 4, 5, 6],
                vec![
                    Field::new("null", DataType::Boolean, true),
                    Field::new("bool", DataType::Boolean, false),
                    Field::new("int", DataType::Int64, false),
                    Field::new("float", DataType::Float64, false),
                    Field::new("string", DataType::Utf8, false),
                    Field::new("array", DataType::Utf8, false),
                    Field::new("object", DataType::Utf8, false),
                ]
            ),
            UnionMode::Sparse,
        )

When I try to compare it to an integer with json_get(json_data, 'foo')=123, I get the error:

called `Result::unwrap()` on an `Err` value: Plan("Cannot infer common argument type for comparison operation Union([(0, Field { name: \"null\", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"bool\", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (2, Field { name: \"int\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (3, Field { name: \"float\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (4, Field { name: \"string\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (5, Field { name: \"array\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (6, Field { name: \"object\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Sparse) = Int64")

Is there any way to rewire the logic plan to tell datafusion how to implement these comparisons?

If not, I might have to abandon the json_get method, and instead implemtn json_get_str, json_get_int etc., which would be unfortunate.

I tried implementing FunctionRewrite, but the error occurs before it's called.

To Reproduce

see tests in datafusion-contrib/datafusion-functions-json#3

Expected behavior

No response

Additional context

No response

@jayzhan211
Copy link
Contributor

jayzhan211 commented Apr 23, 2024

I took a look a bit, and I found that your return_type is Union, but If I understand correctly, you should compute the return type based on args. For example, in your test json_get(json_data, 'foo'), if you compute the return_type based on foo, you return Int, then you will not meet the error.

The error is because Union is not yet supported in comparison_coercion

@samuelcolvin
Copy link
Contributor Author

samuelcolvin commented Apr 23, 2024

@jayzhan211 that doesn't work since the argument types don't tell you what type will be returned.

e.g.:

  • if the value in column foo is {"x": "abc"}, then json_get(foo, 'x') will return a string
  • but if the value in column foo is {"x": 123}, then json_get(foo, 'x') will return an integer

However I think I have a work around, I'm requiring a cast, so you have to do json_get(foo, 'x')::string or json_get(foo, 'x')::int, then I'm using a FunctionRewrite to rewrite the function from json_get to json_get_str or json_get_int.

With that the only remaining issue is making the error less ugly and more informative.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Apr 24, 2024

@samuelcolvin Did you also consider return_type_from_exprs, if it does not work, I think we can either support Union in comparison_coercion or better design ScalarUDFImpl to do more customize about the return type.

@samuelcolvin
Copy link
Contributor Author

ye, return_type_from_exprs doesn't help.

I got around it mostly, with I think good performance by rewriting the query when there's a cast, so:

select * from foo where json_get(attributes, 'bar')::string='ham'

Will be rewritten to:

select * from foo where json_get_str(attributes, 'bar')='ham'

The main requirement I have now is that the error when you do try to compare a union is more helpful and less ugly.

@jayzhan211
Copy link
Contributor

The main requirement I have now is that the error when you do try to compare a union is more helpful and less ugly.

Do you mean the error message in comparision_coerion?

called `Result::unwrap()` on an `Err` value: Plan("Cannot infer common argument type for comparison operation Union([(0, Field { name: \"null\", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"bool\", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (2, Field { name: \"int\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (3, Field { name: \"float\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (4, Field { name: \"string\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (5, Field { name: \"array\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (6, Field { name: \"object\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Sparse) = Int64")

@samuelcolvin
Copy link
Contributor Author

sorry for the slow reply, yes exactly.

@samuelcolvin
Copy link
Contributor Author

I'm working on this, initially in arrow-rs.

@samuelcolvin samuelcolvin linked a pull request Aug 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants