-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: create backup compaction iterator #137529
backupccl: create backup compaction iterator #137529
Conversation
9c30e58
to
044915a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
044915a
to
817e116
Compare
@msbutler Thanks for all the feedback! I've updated the tests/comments |
817e116
to
4e49590
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide some context for this PR, such as where this iterator will be used?
Does this need to implement SimpleMVCCIterator
? Looking at the example of ReadAsOfIterator
, it seems to me that all the callers only use the concrete type and call only {SeekGE, Valid, NextKey, UnsafeKey, UnsafeValue}
. If that is correct, we should also remove unnecessary methods from ReadAsOfIterator
(in a separate PR of course).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @msbutler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @msbutler)
pkg/storage/backup_compaction_iterator.go
line 27 at r2 (raw file):
asOf hlc.Timestamp // valid tracks if the current key is valid
nit: missing period.
pkg/storage/backup_compaction_iterator.go
line 30 at r2 (raw file):
valid bool // err tracks if iterating to the current key returned an error
ditto
pkg/storage/backup_compaction_iterator.go
line 63 at r2 (raw file):
func (f *BackupCompactionIterator) SeekGE(originalKey MVCCKey) { // See ReadAsOfIterator comment for explanation of this
nit: missing period.
pkg/storage/backup_compaction_iterator.go
line 76 at r2 (raw file):
} // advance moves past keys with timestamps later than f.asOf
nit: missing period.
pkg/storage/backup_compaction_iterator.go
line 136 at r2 (raw file):
func (f *BackupCompactionIterator) assertInvariants() error { if err := assertSimpleMVCCIteratorInvariants(f); err != nil {
never mind my earlier comment -- I suppose reusing this function is why this implements SimpleMVCCIterator
.
Those panics in the earlier methods don't trip up this invariant checking because all those Range*
methods are only called when HasPointAndRange
returns hasRange=true
. As a nit, I am not sure that is properly documented as requirements in SimpleMVCCIterator
, so consider strengthening the commentary in SimpleMVCCIterator
.
pkg/storage/backup_compaction_iterator.go
line 140 at r2 (raw file):
} if f.asOf.IsEmpty() {
This assertion belongs in NewBackupCompactionIterator
.
pkg/storage/backup_compaction_iterator.go
line 146 at r2 (raw file):
if ok, err := f.iter.Valid(); !ok || err != nil { errMsg := err.Error() return errors.AssertionFailedf("invalid underlying iter with err=%s", errMsg)
iterators can have errors because of underlying detected corruption. Such errors are not assertion failures. If the iterator did not detect an error (which happens via block checksumming etc.), we may still want to check application level (CRDB being the application) errors or certain invariants that should never be violated. Those should be wrapped in AssertionFailedf
. If you look at assertSimpleMVCCIteratorInvariants
that's roughly the pattern that is followed in examples like:
// Keys can't be empty.
if len(key.Key) == 0 {
return errors.AssertionFailedf("valid iterator returned empty key")
}
value, err := iter.UnsafeValue()
if err != nil {
return err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @msbutler, and @sumeerbhola)
pkg/storage/backup_compaction_iterator.go
line 136 at r2 (raw file):
Previously, sumeerbhola wrote…
never mind my earlier comment -- I suppose reusing this function is why this implements
SimpleMVCCIterator
.
Those panics in the earlier methods don't trip up this invariant checking because all thoseRange*
methods are only called whenHasPointAndRange
returnshasRange=true
. As a nit, I am not sure that is properly documented as requirements inSimpleMVCCIterator
, so consider strengthening the commentary inSimpleMVCCIterator
.
To provide some context on this work, feel free to check out the prototype. Another reason for re-using SimpleMVCCIterator
is because we will update the iterator in MergedSST
to a generic type so that it can be used in both restores as well as compactions.
Regarding the requirement on hasRange=true
, it looks like our existing underlying iterators simply return zero values if the range does not exist. @msbutler I did notice that ReadAsOfIterator
also does the same, to preserve the behavior perhaps we should remove the panics.
pkg/storage/backup_compaction_iterator.go
line 140 at r2 (raw file):
Previously, sumeerbhola wrote…
This assertion belongs in
NewBackupCompactionIterator
.
NewBackupCompactionIterator
sets asOf
to the MaxTimestamp
if not provided (same behavior as the ReadAsOfIterator
)
pkg/storage/backup_compaction_iterator.go
line 146 at r2 (raw file):
Previously, sumeerbhola wrote…
iterators can have errors because of underlying detected corruption. Such errors are not assertion failures. If the iterator did not detect an error (which happens via block checksumming etc.), we may still want to check application level (CRDB being the application) errors or certain invariants that should never be violated. Those should be wrapped in
AssertionFailedf
. If you look atassertSimpleMVCCIteratorInvariants
that's roughly the pattern that is followed in examples like:// Keys can't be empty. if len(key.Key) == 0 { return errors.AssertionFailedf("valid iterator returned empty key") }
value, err := iter.UnsafeValue() if err != nil { return err }
From the looks of it, assertSimpleMVCCIteratorInvariants
already wraps errors that should be considered assertion errors as assertion errors, so I suppose just returning the error as is is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @msbutler, and @sumeerbhola)
pkg/storage/backup_compaction_iterator.go
line 146 at r2 (raw file):
Previously, kev-cao (Kevin Cao) wrote…
From the looks of it,
assertSimpleMVCCIteratorInvariants
already wraps errors that should be considered assertion errors as assertion errors, so I suppose just returning the error as is is sufficient.
Oh whoops was reading the wrong section — in this case, f.iter.Valid()
is being checked if f.valid
is true, I think that's a fine assertion to make. Also the assertSimpleMVCCIteratorInvariants
call at the top handles those assertions. I'm not quite sure what you change you are proposing here, could you elaborate?
4e49590
to
72ab307
Compare
@sumeerbhola for more context, checkout this design doc https://docs.google.com/document/d/1ZslwAxnr46VvrwOL-P_rAUnWPo2ztXl8F2SkoB3VpK4/edit?tab=t.0#heading=h.9ndo02nsmu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the context, why are there no range keys (MVCC tombstones) handled here? Is this related to the idea that once a range tombstone is applied, no point keys will ever be written on top (an invariant maintained by a higher layer), so the BackupCompactionIterator.iter
is already configured to apply the effect of the range tombstone but not surface it?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @msbutler)
At least for the MVP of this backup compaction feature, we are assuming that rangekeys will not get written to the key space that we need to back up. Jeff is currently exploring the performance implications of rolling back imports without range tombstones, for example (design doc coming soon). To deal with backups of whole tenants, this compaction iterator will need to handle range keys, but we'd like to address them after we build the backup compaction feature for all other backups first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @msbutler)
pkg/storage/backup_compaction_iterator.go
line 136 at r2 (raw file):
Previously, kev-cao (Kevin Cao) wrote…
To provide some context on this work, feel free to check out the prototype. Another reason for re-using
SimpleMVCCIterator
is because we will update the iterator inMergedSST
to a generic type so that it can be used in both restores as well as compactions.Regarding the requirement on
hasRange=true
, it looks like our existing underlying iterators simply return zero values if the range does not exist. @msbutler I did notice thatReadAsOfIterator
also does the same, to preserve the behavior perhaps we should remove the panics.
The change to remove the panics is good enough -- no need to update the SimpleMVCCIterator
interface.
pkg/storage/backup_compaction_iterator.go
line 140 at r2 (raw file):
Previously, kev-cao (Kevin Cao) wrote…
NewBackupCompactionIterator
setsasOf
to theMaxTimestamp
if not provided (same behavior as theReadAsOfIterator
)
Sure, but I am not clear on why an assertion about an immutable field of f
belongs in a method that will be called frequently. If the concern is that someone will not use NewBackupCompactionIterator
and just initialize the fields, perhaps hide this in a different package (though I don't think we need to be that paranoid).
pkg/storage/backup_compaction_iterator.go
line 146 at r2 (raw file):
in this case,
f.iter.Valid()
is being checked iff.valid
is true, I think that's a fine assertion to make.
I missed that. Can you add a code comment stating this here, and add something like the following at the top of the method:
// REQUIRES: f.valid
func (f *BackupCompactionIterator) assertInvariants() error {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @sumeerbhola)
pkg/storage/backup_compaction_iterator.go
line 140 at r2 (raw file):
Previously, sumeerbhola wrote…
Sure, but I am not clear on why an assertion about an immutable field of
f
belongs in a method that will be called frequently. If the concern is that someone will not useNewBackupCompactionIterator
and just initialize the fields, perhaps hide this in a different package (though I don't think we need to be that paranoid).
+1 to Sumeer's point. It seems like NewBackupCompactionIterator should simply return an error if the user passes a nil asOf.
72ab307
to
ffd1d47
Compare
c2a68e0
to
7ce916e
Compare
For the purposes of backup compaction, a custom iterator is required that behaves similarly to the ReadAsOfIterator, but also surfaces live tombstones point keys. Epic: none Release note: None
7ce916e
to
16f1fe9
Compare
TFTR! bors r=msbutler |
👎 Rejected by code reviews |
dm'd that changes are not blocking
bors r+ |
For the purposes of backup compaction, a custom iterator is required that behaves similarly to the ReadAsOfIterator, but also surfaces live tombstone point keys.
Epic: none
Release note: None