-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Casting to and from unions #6247
Comments
Much as route 1. (very lax casting of unions) would simplify my use case, in writing this up I realised that probably doesn't make much sense in general. |
Yeah -- I think of a Union as the way in arrow to represent a dynamically typed value: each row of a union can be one of a set of different types So I guess if we are casting from a union array to another type, I would as a user expect that each row of the union (regardless of what variant it was) would be cast to the target type |
Hey @samuelcolvin, if I get it right, your first option looks like union_extract from DuckDB. I'm trying to implement it at apache/datafusion#12116 in case it helps |
interesting, I don't know what @alamb things, but I'd say it would be best to implement it in this repo rather than datafusion. |
I think implementing a |
I can port the PR here, it will take a few days. Hopefully this would avoid duplicate review work, especially since most of the tests should be rewritten from sqllogictests to unit tests. Do you agree? |
Continuing from #6218 (review) — I thought it worth creating a dedicated issue to discuss this before writing any more code.
Well
pyarrow
doesn't help much (or maybe it helps a lot by giving us flexibility!)All four cases fail:
Python Code
Here's my proposal for what we support and don't support (yet):
Casting to sparse and dense union
We choose the most appropriate child to cast to using the current logic - choose the exact matching type, otherwise the first type you can cast to, left to right.
I think this is fairly simple, uncontroversial and already implemented in #6218.
Casting from sparse and dense unions
I think we can support both sparse and dense using either
zip
,interleave
ortake
— any suggestion on which will be fastest much appreciated.We can do this, either:
null
I think @alamb suggested he'd prefer 2., I started implementing 1. in #6218 — this is so we can use this union cast logic for
datafusion-functions-json
, to match postgres behaviour.When the user queries:
The value returned from
thing->'field'
is aJsonUnion
, hence I need that to be cast to an int even though that union includes stuff like string, object and array that can't be cast to an int.(I'm trying to roughly match PostgreSQL where
select ('{"foo": 123}'::jsonb->'foo')::int
is valid)If we go with route 2. above, this expression would raise an error.
Note: for the above case of
(thing->'field')::int
, we already do an optimisation pass where we convertjson_get_union(thing, 'field')::int
tojson_get_int(thing, 'field')
and therefore avoid this problem. My reason for implementing casting from unions in the first place was to support expression whereJsonUnion
is compared to values, but the optimization won't or can't work, e.g. ifthing->'field'
is in a CTE, then used later.I guess if we decide that route 2. is correct, I have a few options:
JsonUnion
, e.g. replace all casts in the query with a UDF that does something custom forJsonUnion
The text was updated successfully, but these errors were encountered: