Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any restriction to ReferenceRel.subtree_ordinal wrt the ordinal of the current plan? #613

Open
ingomueller-net opened this issue Mar 21, 2024 · 6 comments

Comments

@ingomueller-net
Copy link
Contributor

I am wondering if the subtree_ordinal field in the ReferenceRel has any restrictions with respect to the ordinal of the current plan. One conceivable restriction would be that it is only allowed to have values lower than the one of the current plan. The specification does say that the relations in a plan form a DAG, and such a restriction would (1) enforce the fact that the relations form a DAG and (2) make it easier to reconstruct the references when converting the Substrait plan into any other format that cannot handle dangling references.

If the answer is yes (or people think that such a restriction would be a good addition), then this should probably made explicit in the documentation.

If the answer is no, what happens to a plan where the references of relations form a cycle? Would such a plan simply be invalid?

@EpsilonPrime
Copy link
Member

There is a class of essentially recursive plans that we don't have an explicit definition for at the moment but probably should. These are Common Table Expressions (CTEs). As such I don't think we want to outlaw recursion but I do agree the potentiality exists to make an invalid plan. At the moment it is up to the consumer to reject any plan it finds uncomfortably recursive.

After we have CTEs defined and we find there are no other ways that need to be expressed we can look into restrictions that are easy to enforce at construction time.

https://docs.snowflake.com/en/user-guide/queries-cte

@ingomueller-net
Copy link
Contributor Author

OK, I see how recursive CTEs are a reason to support cyclic references. If/when that should be supported, the documentation has to be changed, though: it currently says that the plan should be a DAG, which would forbid cycles/recursion.

The concrete answer to my question, I guess, is that there is currently no restriction on subtree_ordinal and there shouldn't be in the long run because of eventual support for recursive CTEs. (I guess there are other constraints about which form of recursion are allowed but they go beyond what I am currently interested in.)

At the moment it is up to the consumer to reject any plan it finds uncomfortably recursive.

Right, that makes sense. But I guess we'd eventually want to know what a maximally conforming consumer is expected to do/accept, right? My understanding is that that is currently still open: it will eventually be a graph, potentially with cycles, but further restrictions on those cycles (related to recursive CTEs).

@EpsilonPrime
Copy link
Member

I agree that we do need to know and describe those limits as well as the overall graph structure. If it matters to you we can try to tackle those problems now.

@ingomueller-net
Copy link
Contributor Author

OK, sounds good!

For me, knowing that cycles can exist is good for. (#612 has a higher priority for me, personally.)

@ingomueller-net
Copy link
Contributor Author

Small addendum: I have checked several SQL dialects and all of them only allow self-references, not general cycles.

The structure of recursive CTEs is usually a non-recursive base case and a recursive case that adds tuples to the base case until no more tuples are added. What is "added" this way is computed based on the content of "self" in the previous iteration, hence the self-references. I wouldn't know how that could be extended to general cycles...

With one exception, all dialects I checked also only allowed backward references, i.e., each CTE in the WITH clause can only refer to other CTEs defined before (or itself, in case of a recursive CTE). This would correspond to restricting subtree_ordinal to be less or equal than the current ordinal. The one exception I saw is the BigQuery dialect, which also allows forward references in WITH RECURSIVE clauses, but still no cycles. So such a plan could still be ordered to consist of only forward references.

I don't have a clear preference at this point. I think it's a trade-off between simplifying consumers and the ability to express things that might be desirable in the future but for which we don't have examples at this point.

I am on the go right now but hopefully I'll add links later.

@ingomueller-net
Copy link
Contributor Author

My question is answered: there can be cycles (at least self-references) and there is no restriction on subtree_ordinal.

Unless people feel that we might want to change that, which I don't have the impression, we can close this issue now.

jacques-n pushed a commit that referenced this issue Aug 1, 2024
#612 and #613 highlighted a number of ambiguities in plan
interpretation. This should address those ambiguities. 

Co-authored-by: Ingo Müller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants