Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add operators to support duplicate eliminated joins #695
base: main
Are you sure you want to change the base?
feat: add operators to support duplicate eliminated joins #695
Changes from 2 commits
a4852dc
bd0eb8b
fa73e92
8c376c6
bbc75dd
5fa49e7
991f772
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure I understand (but this is not surprising as I am not an expert in relational algebra and had difficulty understanding the paper). From my read of the paper it seems that general unnesting can be used to convert a query with dependent joins into a query without them. Duplicate elimnated joins seem to be an optimization that is useful to simplify plans created by generate unnesting but not strictly needed to enable it.
Also, duplicate eliminated joins seems to be a general optimization and not specific to query unnesting. Though perhaps it is mostly useful in that context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paper is indeed very difficult to understand. There is also a video from @Mytherin explaining the topic.
The duplicate eliminated join is not only an optimization, but rather a necessary technique to get rid of
Dependent Joins
. In some cases you don't need a duplicate eliminated join to de-correlate but on others they are necessary.I'm not sure what you mean by:
"Also, duplicate eliminated joins seems to be a general optimization and not specific to query unnesting. Though perhaps it is mostly useful in that context."
I'm not aware of other scenarios other than correlated subqueries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm following things correctly, the goal here is to take the hashtable from the build side and push that hash table into branch that calculates the probe input?
Also, we keep talking about "deduplicated result" but we are only talking about the key columns and not the entire column set correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it's not the hastable but actually the deduplicated side. You can see that the
Duplicate Eliminated Get
has arepeated Expression.FieldReference column_ids = 3;
representing the deduplicated columns.