Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIG domains for planner and plan evaluator #584

Merged
merged 9 commits into from
May 10, 2022

Conversation

dlurton
Copy link
Member

@dlurton dlurton commented Apr 29, 2022

Originally part of #582 and based on #586 (which must be merged first) this PR contains all the PIG domain (tree data structure) changes needed for the planner and plan evaluator. Documentation in partiql.ion should be relatively self-explanatory, please let me know if not.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dlurton dlurton changed the title PIG domains for plan evaluator PIG domains for planner and plan evaluator Apr 29, 2022
@dlurton dlurton force-pushed the physical-plan-staging-pig-domains branch from b26029e to 3c583bc Compare April 29, 2022 20:28
@dlurton dlurton marked this pull request as draft April 29, 2022 21:04
@dlurton dlurton changed the base branch from physical-plan-staging-pipeline to physical-plan-staging-test-exclusions April 29, 2022 21:15
@dlurton dlurton marked this pull request as ready for review April 29, 2022 21:22
Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave a cursory review of the PIG domains defined in partiql.ion. Looks reasonable based off what I remember from some offline discussions. I think @jpschorr and @am357 should also take a look to check the plans align with the reviewed query plan docs.

I also think it would be good to include some of the docs part of those offline discussions in the partiql-docs repo since the query plan and planner are to be open-sourced and public APIs.

// (struct (struct_field (lit a) (lit 42))
// Returns: { a: 42 }
//
// Example as a struct-union. Given a global environment `{| foo: { a: 42 }, bar: { b: 43} |}`, then:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the vbars (i.e. |) surrounding foo: { a: 42 }, bar: { b: 43} mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supposed to denote a bindings environment which is similar to but different than a struct. However, I really don't feel like providing a section describing notation here so I'll change this to be more clear without the notation.

// - `name`: the name of the variable as specified by the query author or determined statically.
(product var_decl name::symbol)

// The operators of PartiQL's relational algebra. See `$projectDir/docs/dev/RELATIONAL-ALGEBRA.md` for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc RELATIONAL-ALGEBRA.md isn't included yet on this branch. Can it be included in this PR? Or possibly ported to partiql-docs/drafts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to port it to to the partiql-docs/drafts and referencing it here.

Copy link
Member Author

@dlurton dlurton May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That particular document is out of date. I'll remove the reference to it for now. I have a draft of a better version but it needs to be cleaned up and made ready for public consumption, so I'll add this ticket to the current milestone: #599

lang/resources/org/partiql/type-domains/partiql.ion Outdated Show resolved Hide resolved
Comment on lines 689 to 692
// Elements:
// - `name`: the unique name of the implementation. Each operator has a different namespace containing its
// default and PartiQL-specific
(product impl name::symbol static_args::(* ion 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could also add a description for the static_args element

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

// If `<expr>` is not a struct, The field `_n` will be included in the struct, where `n` is the
// ordinal of the field in the final merged struct. If `expr` returns a container that is not a struct,
// field names will be assigned in the format of `_n` where `n` is the ordinal position of the
// field within the struct. If `expr` returns a scalar value, it will be coerced into a singleton bag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the merged struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

// - `name`: the name of the variable as specified by the query author or determined statically.
(product var_decl name::symbol)

// The operators of PartiQL's relational algebra. See `$projectDir/docs/dev/RELATIONAL-ALGEBRA.md` for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to port it to to the partiql-docs/drafts and referencing it here.

// For `.*` in SELECT list
// If `<expr>` is a struct, the fields of that struct will be part of the merged struct.
// If `<expr>` is not a struct, The field `_n` will be included in the struct, where `n` is the
// ordinal of the field in the final merged struct. If `expr` returns a container that is not a struct,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit.: <expr>? also could we change the name of the lhs from expr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm confused as to what you are asking here. lhs typically refers to the "left hand side" of a binary expression but this isn't a binary expression?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, indeed, it's confusing. I meant in expr::expr, changing the property name expr; or making the reference in the documentation more clear b/c IMO for fresh eyes, it's confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's part_expr?

Comment on lines 564 to 565
// Basic join operator. Covers cross, inner, left, right and full joins.
// For cross joins, set `join_type` to `(inner)` and the `predicate` to `(lit true)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and the other join definition in physical_plan: curious why going w/ this approach for representing cross join? Does going w/ the following make sense?

(sum join_type (cross) (inner) (left) (right) (full))

(join
    join_type::join_type
    left::bexpr
    right::bexpr
    predicate::(? expr))

Copy link
Member Author

@dlurton dlurton May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bristled when I first saw this particular modeling of cross joins as well but on deep inspection IMHO it really is the best way to model cross joins. (And this is fundamentally similar to how they are modeled with the AST today.)

To demonstrate why it doesn't make sense to add (cross) to join_type: how should the evaluator behave when the predicate is not null and the join_type is (cross)?

  • Throw an exception? (That would be an edge case that would have to be handled many places, which would increase the noise level in the implementation.)
  • Honor the predicate anyway? But that would be identical to an inner join, making (cross) redundant.

Copy link
Contributor

@am357 am357 May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could ignore the predicate when the join_type is cross. Although still not ideal, having it defined explicitly is better than implicit IMO. Thinking about this further and also looking at the grammar, perhaps we can also go with the following model (or something similar)?

(sum join
  (product left left::bexpr predicate::expr)
  (product right right::bexpr predicate::expr)
  (product inner left::bexpr right::bexpr predicate::expr)
  (product cross  left::bexpr right::bexpr))

Copy link
Member Author

@dlurton dlurton May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. left, right should have the same elements as inner--they each require a left and right bexpr and a predicate.
  2. With this proposal, the code that consumes the generated classes is actually more complex. I.e. to transform all joins within a VisitorTransform, we would have to override 4 different methods instead of 1. Pattern matching too has similar concerns.
  3. With inner, I can still set the predicate to (lit true) and get something that's semantically equivalent to cross). This makes cross kind of redundant. Such redundancies generally increase complexity.

It's funny--I distinctly remember having this exact discussion (but related to ExprNode) with @almann and @therapon. I had your position back then and they argued strongly for modeling cross as an inner with a (lit true) predicate. It's taken time but I've come around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1, yes that was a miss from my side. For 2 we could have a wrapper type with having a member property as kind that references the join sum. I see the 3 orthogonal to choice of having cross or not as this is still the case without it.

I hope I come around one day, but at this stage, I just can't see having an implicit logic to model rather than a precise model is a better choice. Let me know if you want to discuss offline and we can take it from there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline discussion we will make predicate nullable so that if the user did not specify a predicate (as with cross join), the AST will not lie by saying there was a (lit true).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved, but putting this non-blocking comment here: Although we disregard a join w/ predicate as cross join, to be more defensive, I think we could still go w/ having the (cross_join) as a join type to ensure the absence of predicate is not accidental to infer the cross join.


(with expr
// At this point, there should be no undefined variables in the plan since the area all rewritten to
// dynamic lookup function call sites (if enabled).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate, enabled by what? CompilerPipeline? Also perhaps there is an assume knowledge here that requires expanding.

Copy link
Member Author

@dlurton dlurton May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep references to implementation specific details such as CompilerPipeline (actually PlannerPipeline, introduced in #590) out of this document since it can/should be used to generate the data structures of multiple partiql implementations. However, based on that argument, dynamic lookup function call sites are also an implementation detail. I'll remove that.


// Global variable reference--typically a table although it can actually be bound to any value. Unlike
// local variables, global variables are not stored in registers. Instead, they are typically stored
// in persistent storage. Evaluating a `global_id` will return a value with an open iterator. There
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to change persistent storage to non-ephemeral meaning the value persists after query evaluation (or something along those lines)? This is to cater for in-memory database systems or RDBMSs that can map a table/schema to memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the term "non-ephemeral" wouldn't apply to an in-memory only database, would it? I'll rephrase this to be more inclusive of different types of storage systems.

@dlurton dlurton requested review from am357 and alancai98 May 6, 2022 17:15
Base automatically changed from physical-plan-staging-test-exclusions to physical-plan-staging May 9, 2022 18:19
@dlurton dlurton mentioned this pull request May 9, 2022
@dlurton dlurton merged commit c57fdff into physical-plan-staging May 10, 2022
@dlurton dlurton deleted the physical-plan-staging-pig-domains branch May 10, 2022 17:21
dlurton added a commit that referenced this pull request May 27, 2022
Merges the following PRs to `main`:

- #583 
- #584 
- #587 
- #588 
- #589 
- #609 
- #590 
- #592 

Also, this adds a TODO comment (ea40e4a, requested by @am357 in offline chat) and excludes `PlannerPipeline` from the new aggregate tests which were pulled in when merging from `main` and were not part of the other PRs (34025b3).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
RCHowell pushed a commit that referenced this pull request Jun 1, 2022
Merges the following PRs to `main`:

- #583 
- #584 
- #587 
- #588 
- #589 
- #609 
- #590 
- #592 

Also, this adds a TODO comment (ea40e4a, requested by @am357 in offline chat) and excludes `PlannerPipeline` from the new aggregate tests which were pulled in when merging from `main` and were not part of the other PRs (34025b3).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
RCHowell pushed a commit that referenced this pull request Jun 1, 2022
Merges the following PRs to `main`:

- #583 
- #584 
- #587 
- #588 
- #589 
- #609 
- #590 
- #592 

Also, this adds a TODO comment (ea40e4a, requested by @am357 in offline chat) and excludes `PlannerPipeline` from the new aggregate tests which were pulled in when merging from `main` and were not part of the other PRs (34025b3).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
RCHowell pushed a commit that referenced this pull request Jun 1, 2022
Merges the following PRs to `main`:

- #583 
- #584 
- #587 
- #588 
- #589 
- #609 
- #590 
- #592 

Also, this adds a TODO comment (ea40e4a, requested by @am357 in offline chat) and excludes `PlannerPipeline` from the new aggregate tests which were pulled in when merging from `main` and were not part of the other PRs (34025b3).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants