[BugFix] Disable duplicate sort key when creating table (backport #43206) #43376
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I'm doing:
If we create table with duplicate sort key columns, BE maybe crash. For example:
k1,v1,v1
to read the sort key columns from segment data.segment_iterator
, we will createcolumn_iterator
for each column according to the column id. However, the two columnsv1
has the same column id, so we only create one column iterator.v1
will use the same column iterator to read data and the offset is inconsistent with columnk1
and it may cause the rows in different columns in a chunk are inconsistent and this may cause BE crash.What I'm doing:
Disable duplicate sort key when creating table.
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
This is an automatic backport of pull request #43206 done by [Mergify](https://mergify.com). ## Why I'm doing: If we create table with duplicate sort key columns, BE maybe crash. For example: 1. We create a primary key table with duplicate sort key column (k1,v1,v1). 2. In vertical compaction, we will first create a schema `k1,v1,v1` to read the sort key columns from segment data. 3. When we create `segment_iterator`, we will create `column_iterator` for each column according to the column id. However, the two columns `v1` has the same column id, so we only create one column iterator. 4. During read data, we will generate a chunk with three columns and read data column by column. But the two columns `v1` will use the same column iterator to read data and the offset is inconsistent with column `k1` and it may cause the rows in different columns in a chunk are inconsistent and this may cause BE crash.
What I'm doing:
Disable duplicate sort key when creating table.
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist: