Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update RangeTree MaxUpperBound when removing nodes #2152

Merged
merged 9 commits into from
Nov 22, 2023
Merged

Update RangeTree MaxUpperBound when removing nodes #2152

merged 9 commits into from
Nov 22, 2023

Conversation

jycor
Copy link
Contributor

@jycor jycor commented Nov 19, 2023

Verification code from in the last RangeTree related PR caught some bad ranges in the sqllogictests.
The cause was not properly updating the MaxUpperBound in nodes for the rangetree when performing a remove operation, which led to missing connections when pruning ranges.

This also prints MaxUpperBound in the String/DebugString methods for easier debugging.

Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on these correctness bugs is great work, index ranges really needed the attention.

I don't fully understand how the RB tree works, it's kind of hard for me to comment on whether we're missing anything additional with the bound checks. @Hydrocharged might be better positioned to catch additional bugs. Another thing we could do is add more of the failing correctness tests as tests here if for some reason this bug isn't directly related to all of the regressions.

The thing we've already talked about and might be worth an RFC for improving the reliability of this whole module is switching from the range tree to a single interval merge algorithm. We only build one range collection for each filter expression after my refactors, and we can collect all of the filters before invoking merging. I.e. the mutability of ranges that the range tree was designed for is kind of moot. I think a linear collect, nlogn sort, linear merge would do the same thing as a range tree currently in a way that is easier to test. Adding tests for specific failure patterns is still useful regardless of whether it's before or after switching the algo.

"",
rng: r(req(5)),
exp: "RangeColumnExprTree\n" +
"│ ┌── [11, 11] max: >11 color: 0\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the formatting is a bit confusing

Copy link
Contributor

@Hydrocharged Hydrocharged left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Need more test variety for sure, but the tree logic looks good from some quick napkin diagrams.

},
{
name: "insert rebalance root from right",
setupRngs: []sql.Range{r(req(7)), r(req(3)), r(req(11)), r(req(8)), r(req(9)), r(req(10)), r(req(12))},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like some tests that use more than equals. We can have tests of all less than, etc. and also combine them. Hypothetically, it could all work great for equals since it's the most basic case, and break down on the other types.

},
{
name: "remove root",
setupRngs: []sql.Range{r(req(7)), r(req(3)), r(req(11)), r(req(9)), r(req(13))},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment goes for this test too, need more than equals.

sql/range_cut.go Outdated
@@ -157,6 +159,11 @@ func (a Above) String() string {
return fmt.Sprintf("Above[%v]", a.Key)
}

// DebugString implements RangeCut.
func (a Above) DebugString() string {
return fmt.Sprintf(">%v", a.Key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why these were added, but considering ranges have the concept of greater than and less than, these will cause more confusion than clarity. You'd have to look at the DebugString functions to know that this isn't greater than key, but is instead above key. The original strings are much more clear with that, so I think these should be removed, or internally just call a.String().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants