-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double-check whether predicate expressions have correct nullability semantics #51
Comments
@jvanstraten Curious what is the "validation code"? Is there some sort of a test that specifies query plans and inputs and expected outputs? I'd be curious to see what we have for the joins. |
The validator (this repository) only checks whether a plan is valid, and sort of describes in English how the individual relations and expressions should behave; it doesn't know how to actually execute a plan given some input tables. The descriptions are probably not precise enough for joins in this case, because all the descriptions are currently limited to my own personal understanding. I would love some double-checking of the descriptions it does emit, though. Running the validator should be super easy if you have some plans already (see toplevel readme file); if you disagree with something it says or if you think the spec disagrees with it, please open an issue here. |
@jvanstraten This sounds super interesting. I'd love to try it out. Did you mean to include a link for "(this repository)"? For some reason it shows up as regular text. |
Oh... perhaps, you literally meant this repository: https://github.com/substrait-io/substrait-validator Let me try it. |
@jvanstraten I was able to install and run the validator on one of Substrait plan checked in to Velox as JSON: velox/substrait/tests/data/q6_first_stage.json I see a lot of errors, but I assume we can fix these. I also see the query plan. This is pretty cool. The description for the Filter node says "This relation discards all rows for which and:bool_bool(and:bool_bool(and:bool_bool(..), ..), lt:fp64_fp64(?.0, ..)) yields false. Behavior for a null condition is unspecified." I wonder why it says "Behavior for a null condition is unspecified." Should this be removed or, perhaps, clarified to explain that if the filter condition is evaluated to NULL, the row is not included? |
Yeah, that's exactly what this issue is for. @mbrobbel is already working on it. |
@jvanstraten Great! Thank you for clarifying. |
@jvanstraten you can assign this one to me. |
When I wrote the bulk of the validation code, I was properly confused about nullable booleans in the context of predicate expressions (both things that want one, and things that return one, like set containment subqueries). There is a good summary about what the behavior of these things should be here, but it's probably worth double-checking corner cases against query engine documentation or just by experimentation. When a nullable boolean is received by a predicate expression slot, the validator should probably also emit a note of some kind about what null means in that particular context.
Todo list (to check):
The text was updated successfully, but these errors were encountered: