-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manifest: virtual iterator bounds checking virtual sstable metadata #2698
manifest: virtual iterator bounds checking virtual sstable metadata #2698
Conversation
8ef8748
to
aa51c74
Compare
table_cache_test.go
Outdated
v2.ValidateVirtual(parentFile) | ||
internalIterator, _, err := d.newIters(context.TODO(), v1, nil, internalIterOpts{}) | ||
require.NoError(t, err) | ||
v1.ValidateVirtual(parentFile, internalIterator, d.cmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be best to pass d.newIters
into ValidateVirtual
, so we only incur the cost of creating the iterators if invariants are enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I was trying that as well but there is some issues because some of the arguments such as internalIterOpts
is not exported (and used in a lot of places), also packages such as keyspan
and pebble
cause import issues if used in the manifest
package (so im not sure how to type the arguments required). Also, I think some places such as sstable/reader_test.go:317:26
do not have a db
object, not sure if I should try creating one or is there an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would solve that by passing in an adapter version of newIters
to ValidateVirtual
that doesn't take the internalIterOpts
field, instead passing in the zero value for it. And that test can probably just pass a nil
value for newIters
and ValidateVirtual
can make a check for whether that function is non-nil before calling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I tried doing that, but the issue I was facing was how to type the return value of the new function, specifically keyspan.FragmentIterator
since I don't think I can import the keyspan
module.
internal/manifest/version.go
Outdated
@@ -402,6 +404,14 @@ func (m *FileMetadata) ValidateVirtual(createdFrom *FileMetadata) { | |||
if m.Size == 0 { | |||
panic("pebble: virtual sstable size must be set upon creation") | |||
} | |||
|
|||
if invariants.Enabled { | |||
for key, _ := it.First(); key != nil; key, _ = it.Next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check we'd wanna do is that there's a point key at exactly the same user key as the lower bound, and another at exactly the upper bound, or a straddling range key/del at those bounds. That would confirm the bounds were generated as tight and there isn't a gap.
This check is really only mimicking the check that we're doing in the iterator stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense yeah i'll check that as well
4113e50
to
84eb77a
Compare
312d59f
to
2244422
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good! Mind throwing in ValidateVirtual()
+ checkVirtualBounds calls to both leftFile
and rightFile
in d.excise
, as well as at the bottom of ingestLoad1Shared
?
db.go
Outdated
|
||
it, _, err := d.newIters(context.TODO(), m, opts, iterOpts) | ||
if err != nil { | ||
panic("pebble: error retrieving virtual iterator") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better as it'd surface the error:
panic(fmt.Sprintf("pebble: error creating iterator: %s", err.Error()))
db.go
Outdated
panic("pebble: error retrieving virtual iterator") | ||
} | ||
|
||
// check that virtual sstable bounds are tight from both ends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/check/Check/
db.go
Outdated
// check that virtual sstable bounds are tight from both ends. | ||
first, _ := it.First() | ||
if d.cmp(first.UserKey, m.Smallest.UserKey) != 0 { | ||
panic("pebble: virtual sstable lower bound is not tight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would also make sense to print out the two keys in question here, as well as the file number
panic(fmt.Sprintf("pebble: virtual sstable %s lower bound is not tight: %s != %s", m.FileNum, m.Smallest.UserKey, first.UserKey))
db.go
Outdated
} | ||
last, _ := it.Last() | ||
if d.cmp(last.UserKey, m.Largest.UserKey) != 0 { | ||
panic("pebble: virtual sstable upper bound is not tight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
db.go
Outdated
@@ -2489,3 +2489,31 @@ func (d *DB) SetCreatorID(creatorID uint64) error { | |||
func (d *DB) ObjProvider() objstorage.Provider { | |||
return d.objProvider | |||
} | |||
|
|||
func (d *DB) checkVirtualBounds(m *fileMetadata, opts *IterOptions, iterOpts internalIterOpts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opts and iterOpts seem unused by all callers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup! for the change in ingestLoad1Shared
is there a way to get a *FileMetadataType
from SharedSSTMetadata
that can be used for ValidateVirtual?
Reviewable status: 0 of 4 files reviewed, 7 unresolved discussions (waiting on @itsbilal)
db.go
line 2493 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
opts and iterOpts seem unused by all callers
yeah, is it every worth keeping parameters like these to have this function be more versatile for people to use in the future if they need this function used in other places (or do we not care about that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 6 unresolved discussions (waiting on @itsbilal)
db.go
line 2500 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
This would be better as it'd surface the error:
panic(fmt.Sprintf("pebble: error creating iterator: %s", err.Error()))
done
db.go
line 2503 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
nit: s/check/Check/
done
db.go
line 2506 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
would also make sense to print out the two keys in question here, as well as the file number
panic(fmt.Sprintf("pebble: virtual sstable %s lower bound is not tight: %s != %s", m.FileNum, m.Smallest.UserKey, first.UserKey))
done
db.go
line 2510 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Same here
done
2244422
to
7f872b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ingestLoad1Shared produces a *fileMetadata
, you should be able to verify that, right?
Though one complication is that you won't be able to run checkVirtualBounds
until after ingestLink
has run, as only after that would you be able to create an iterator on the FileMetadata. If you can navigate the ingestion code to run checkVirtualBounds
on only the shared files and after ingestLink
, that'd be good.
ingest.go
Outdated
@@ -1409,6 +1409,9 @@ func (d *DB) excise( | |||
SmallestSeqNum: m.SmallestSeqNum, | |||
LargestSeqNum: m.LargestSeqNum, | |||
} | |||
leftFile.ValidateVirtual(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be down where we add leftFile to ve.NewFiles
, as most of its arguments are not set yet.
ingest.go
Outdated
@@ -1511,6 +1514,8 @@ func (d *DB) excise( | |||
SmallestSeqNum: m.SmallestSeqNum, | |||
LargestSeqNum: m.LargestSeqNum, | |||
} | |||
rightFile.ValidateVirtual(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this will be at the bottom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the old comments still stand?
db.go
Outdated
// Check that virtual sstable bounds are tight from both ends. | ||
first, _ := it.First() | ||
if d.cmp(first.UserKey, m.Smallest.UserKey) != 0 { | ||
panic(fmt.Sprintf("pebble: virtual sstable %s lower bound is not tight: %s != %s", m.FileNum, m.Smallest.UserKey, first.UserKey)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use d.opts.Comparer.FormatKey(m.smallest.UserKey)
for better-formatted outputs instead of just printing the slice here. And same with first.UserKey and the two keys below.
7f872b3
to
50e84fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mb, I forgot to push those
Reviewable status: 0 of 5 files reviewed, 9 unresolved discussions (waiting on @itsbilal)
ingest.go
line 1412 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
This will be down where we add leftFile to
ve.NewFiles
, as most of its arguments are not set yet.
done
ingest.go
line 1517 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Same here, this will be at the bottom
done
50e84fe
to
1f93c44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 9 unresolved discussions (waiting on @itsbilal)
db.go
line 2506 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
you can use
d.opts.Comparer.FormatKey(m.smallest.UserKey)
for better-formatted outputs instead of just printing the slice here. And same with first.UserKey and the two keys below.
done
355e58c
to
3df8aab
Compare
3df8aab
to
4959a91
Compare
00c9d8e
to
f8f219a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
Reviewed 3 of 4 files at r6, all commit messages.
Reviewable status: 3 of 5 files reviewed, 8 unresolved discussions (waiting on @jbowens, @RaduBerinde, and @RahulAggarwal1016)
db.go
line 2493 at r2 (raw file):
Previously, RahulAggarwal1016 (Rahul Aggarwal) wrote…
yeah, is it every worth keeping parameters like these to have this function be more versatile for people to use in the future if they need this function used in other places (or do we not care about that)?
We usually remove it. Easy to add in the future if the need is ever there. I'd just remove it.
db.go
line 2512 at r6 (raw file):
} pointIter, rangeDelIter, err := d.newIters(context.TODO(), m, iterOpts, internalIterOpts{})
I'd throw all the code below this behind if m.HasPointKeys { ... }
db.go
line 2536 at r6 (raw file):
if (rangeDel == nil || d.cmp(rangeDel.SmallestKey().UserKey, m.SmallestPointKey.UserKey) != 0) && (pointKey == nil || d.cmp(pointKey.UserKey, m.SmallestPointKey.UserKey) != 0) { panic(errors.Wrap(err, fmt.Sprintf("pebble: virtual sstable %s lower bound is not tight.", m.FileNum)))
nit: we usually don't put periods at the end of error messages
db.go
line 2546 at r6 (raw file):
// Check that the upper bound is tight. if (rangeDel == nil || d.cmp(rangeDel.LargestKey().UserKey, m.LargestPointKey.UserKey) != 0) &&
We should also do these checks for range keys. It's easier there as you only have one iterator and one set of bounds. That one would be inside an if m.HasRangeKeys {... }
ingest.go
line 1606 at r6 (raw file):
} rightFile.ValidateVirtual(m) d.checkVirtualBounds(rightFile, nil)
We usually denote zero arguments with a small comment naming them:
d.checkVirtualBounds(rightFile, nil /* iterOptions */)
But seeing as you'll be removing that argument anyway...
9fcdc3d
to
657436e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the review
Reviewable status: 0 of 5 files reviewed, 8 unresolved discussions (waiting on @itsbilal, @jbowens, and @RaduBerinde)
db.go
line 2493 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
We usually remove it. Easy to add in the future if the need is ever there. I'd just remove it.
oh, I realized I actually need it for the sharedFileMetadata (to pass the level)
db.go
line 2500 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I'd panic with the error value itself to preserve the original error's stack:
panic(errors.Wrap(err, "creating iterator"))
fixed!
db.go
line 2512 at r5 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: s/rangeIter/rangeDelIter/
Since we have two disparate kinds of ranged keys (range deletions and range keys), we should disambiguate.
fixed!
db.go
line 2523 at r5 (raw file):
Previously, jbowens (Jackson Owens) wrote…
The same also applies to range deletions. The smallest key in the file may be a range deletion, which we're not considering here.
fixed!
db.go
line 2512 at r6 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I'd throw all the code below this behind
if m.HasPointKeys { ... }
added!
db.go
line 2536 at r6 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
nit: we usually don't put periods at the end of error messages
oh yeah forgot about that, fixed!
db.go
line 2546 at r6 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
We should also do these checks for range keys. It's easier there as you only have one iterator and one set of bounds. That one would be inside an
if m.HasRangeKeys {... }
added!
ingest.go
line 1606 at r6 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
We usually denote zero arguments with a small comment naming them:
d.checkVirtualBounds(rightFile, nil /* iterOptions */)
But seeing as you'll be removing that argument anyway...
oh yeah, forgot about that too (I realized I actually need the second argument)
2248e19
to
8d3a4a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once comments are addressed and CI is green
I think the test failures will be fixed once you thread through the level to the newIter calls?
Reviewed 4 of 4 files at r7, all commit messages.
Reviewable status: 4 of 5 files reviewed, 9 unresolved discussions (waiting on @jbowens, @RaduBerinde, and @RahulAggarwal1016)
db.go
line 2524 at r7 (raw file):
pointKey, _ := pointIter.First() var rangeDel *rangekey.Span
this should be *keyspan.Span
, which I know is the same type but we only use the exported rangekey version to refer to range keys in practice.
db.go
line 2550 at r7 (raw file):
for key, _ := pointIter.First(); key != nil; key, _ = pointIter.Next() { if d.cmp(key.UserKey, m.SmallestPointKey.UserKey) < 0 || d.cmp(key.UserKey, m.LargestPointKey.UserKey) > 0 { panic(errors.Wrap(err, fmt.Sprintf("pebble: virtual sstable %s point key %s is not within bounds", m.FileNum, key.UserKey)))
this can be errors.New, not errors.Wrap as err is likely nil. And same for all the cases below except for the panic where err != nil.
db.go
line 2567 at r7 (raw file):
} if m.HasRangeKeys {
nit: you can flatten the code a little bit by doing:
if !m.HasRangeKeys {
return
}
... all the other stuff
266a3eb
to
7514a3c
Compare
Tests in cockroachdb#2698 combined with `TestIngestShared` caught that we were not returning range keys in the order expected by other internal iterators when reading from a foreign sstable. This change re-sorts range keys in trailer descending order once they've been coalesced in ForeignSSTTransformer.
7514a3c
to
db7f5bd
Compare
Tests in cockroachdb#2698 combined with `TestIngestShared` caught that we were not returning range keys in the order expected by other internal iterators when reading from a foreign sstable. This change removes the ForeignSSTTransformer as it's no longer necessary given all range keys are emitted at the ingestion sequence number.
Tests in #2698 combined with `TestIngestShared` caught that we were not returning range keys in the order expected by other internal iterators when reading from a foreign sstable. This change removes the ForeignSSTTransformer as it's no longer necessary given all range keys are emitted at the ingestion sequence number.
Added a check inside of `FileMetadata.ValidateVirtual()` to ensure that the keys returned from a virtual sstable are within the bounds of the Smallest and Largest metadata keys. Informs: cockroachdb#2593
db7f5bd
to
2ff121d
Compare
Added a check inside of
FileMetadata.ValidateVirtual()
to check if the keys returned from a virtual sstable are within the bounds of the Smallest and Largest metadata keys.