-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NULL
in list simplifications
#8688
Comments
NULL
in list simplifcationNULL
in list simplifications
I would like to take a look at this. If my understanding is correct, the first case i.e. As for the second case, |
Thank you !
I think it actually would simplify to Using datafusion-cli, you can check it: ❯ create table t (x int) as values (1), (2);
0 rows in set. Query took 0.007 seconds.
❯ select NULL IN (1, 2);
+----------------------------------------------------------------------+
| NULL IN (Map { iter: Iter([Literal(Int64(1)), Literal(Int64(2))]) }) |
+----------------------------------------------------------------------+
| |
+----------------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.
I actually tried this out and it seems like this is NOT a correct simplification (sorry for my confusion). So maybe we can't simplify such select lists 🤔 DataFusion CLI v34.0.0
❯ create table t (x int) as values (1), (2);
0 rows in set. Query took 0.006 seconds.
❯ select x IN (NULL, 2) from t;
+-----------------------------------------------------------------+
| t.x IN (Map { iter: Iter([Literal(NULL), Literal(Int64(2))]) }) |
+-----------------------------------------------------------------+
| |
| true |
+-----------------------------------------------------------------+
2 rows in set. Query took 0.002 seconds.
❯ select x IN (2) from t;
+--------------------------------------------------+
| t.x IN (Map { iter: Iter([Literal(Int64(2))]) }) |
+--------------------------------------------------+
| false |
| true |
+--------------------------------------------------+
2 rows in set. Query took 0.001 seconds.
❯ Also, the same in postgres: postgres=# create table t (x int);
CREATE TABLE
postgres=# insert into t values (1), (2);
INSERT 0 2
postgres=# select x in (NULL, 2) from t;
?column?
----------
t
(2 rows)
postgres=# select x in (2) from t;
?column?
----------
f
t
(2 rows) |
I see how they are different for the first case. Thank you for explaining. As for the second case, I agree we might not be able to simplify the list. Any Updated the PR to only address the first case. |
Thanks for explain this! 👍 |
Is your feature request related to a problem or challenge?
SELECT .. WHERE NULL IN (1,2,3)
andSELECT ... WHERE x in (NULL, 2, 3)
are alwaysNULL
(and thus will filter out all rows). However, DataFusion will still try and evaluate a predicate:Note there is a
FilterExec
with a non trivial expression in both of the following queries:In both cases the predicate could have been reduced to a single NULL
Describe the solution you'd like
I would like to extend the ExprSimplifier rules to handle the case of
NULL IN (...)
and when the InList containsNULL
Here are some similar rules
https://github.com/apache/arrow-datafusion/blob/cc3042a6343457036770267f921bb3b6e726956c/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L474-L549
Describe alternatives you've considered
No response
Additional context
These types of expressions are sometimes generated programmatically during rewrites in IOx.
It also came up with discussions with @yahoNanJing on #8669
I think this would be a good first issue as the patterns exist already and the need is well defined.
The text was updated successfully, but these errors were encountered: