-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recost LookupJoins to remove special casing for indexes where every column is used in a filter. #1889
Conversation
…olumn is used in a filter.
…nique. Rather than changes those tests, I made the index unique.
…al due to a bug, link to the bug. (Linking to the bug is not possible in `query_plans.go` because the tests are generated.)
b2e8c54
to
01dac86
Compare
@@ -326,7 +326,7 @@ where | |||
" │ │ └─ org1 (longtext)\n" + | |||
" │ └─ TableAlias(dimension)\n" + | |||
" │ └─ IndexedTableAccess(asset)\n" + | |||
" │ ├─ index: [asset.orgId,asset.assetId]\n" + | |||
" │ ├─ index: [asset.orgId,asset.name,asset.val]\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be picking (
orgId,
name,
assetId)
here, bit concerned that I missed us changing it to (orgId,assetID)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's happening because of the low cardinality of the joined tables.
Basically, when costing non-unique lookups, we assume that each key expression cuts the table in half. If the table is small enough, and if the index has enough key expressions, this can result in a cost lower than the cost of a unique lookup.
As the size of the input tables increases, this effect disappears. I added just four extra rows to the test table and it now chooses the correct index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only one I am concerned about is the customer query where we should be picking the unique index, not competing between non-unique.
At extremely low cardinalities, we may pick a non-unique index with more columns than a unique index with fewer columns. This behavior disappears at higher cardinalities, which is where we care more about the performance.
The only joins that are affected by this recosting are joins that use an index where every column in the index has a key expression, but we can't prove that lookups are constant.
We currently special case this, giving these a slightly elevated priority over joins that don't use every column in the index (and we can't prove that lookups are constant.)
So for example a lookup with one key expression on an index with one column ends up being preferred over say, a lookup with three key expressions on an index with four columns, even though it looks like in practice the latter is a more efficient join. Removing this special casing resulted in improved plans in specific client queries.
Most of the changed plans in this PR are workarounds for #1893, #1894, and #1895. A few are actually better plans than we were previously generating.