Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: create backup compaction iterator #137529

Merged

Conversation

kev-cao
Copy link
Contributor

@kev-cao kev-cao commented Dec 16, 2024

For the purposes of backup compaction, a custom iterator is required that behaves similarly to the ReadAsOfIterator, but also surfaces live tombstone point keys.

Epic: none

Release note: None

@kev-cao kev-cao requested a review from a team as a code owner December 16, 2024 16:38
@kev-cao kev-cao requested a review from sumeerbhola December 16, 2024 16:38
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kev-cao kev-cao requested review from dt and msbutler December 16, 2024 16:38
@kev-cao kev-cao force-pushed the backupccl/backup-compaction-iterator branch from 9c30e58 to 044915a Compare December 16, 2024 17:50
Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

pkg/storage/backup_compaction_iterator.go Outdated Show resolved Hide resolved
pkg/storage/backup_compaction_iterator.go Show resolved Hide resolved
pkg/storage/backup_compaction_iterator.go Show resolved Hide resolved
pkg/storage/backup_compaction_iterator.go Show resolved Hide resolved
pkg/storage/backup_compaction_iterator_test.go Outdated Show resolved Hide resolved
pkg/storage/backup_compaction_iterator_test.go Outdated Show resolved Hide resolved
pkg/storage/backup_compaction_iterator_test.go Outdated Show resolved Hide resolved
@kev-cao kev-cao force-pushed the backupccl/backup-compaction-iterator branch from 044915a to 817e116 Compare December 17, 2024 21:39
@kev-cao
Copy link
Contributor Author

kev-cao commented Dec 17, 2024

@msbutler Thanks for all the feedback! I've updated the tests/comments

@kev-cao kev-cao requested a review from msbutler December 17, 2024 21:41
@kev-cao kev-cao force-pushed the backupccl/backup-compaction-iterator branch from 817e116 to 4e49590 Compare December 17, 2024 21:42
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide some context for this PR, such as where this iterator will be used?

Does this need to implement SimpleMVCCIterator? Looking at the example of ReadAsOfIterator, it seems to me that all the callers only use the concrete type and call only {SeekGE, Valid, NextKey, UnsafeKey, UnsafeValue}. If that is correct, we should also remove unnecessary methods from ReadAsOfIterator (in a separate PR of course).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @msbutler)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @msbutler)


pkg/storage/backup_compaction_iterator.go line 27 at r2 (raw file):

	asOf hlc.Timestamp

	// valid tracks if the current key is valid

nit: missing period.


pkg/storage/backup_compaction_iterator.go line 30 at r2 (raw file):

	valid bool

	// err tracks if iterating to the current key returned an error

ditto


pkg/storage/backup_compaction_iterator.go line 63 at r2 (raw file):

func (f *BackupCompactionIterator) SeekGE(originalKey MVCCKey) {
	// See ReadAsOfIterator comment for explanation of this

nit: missing period.


pkg/storage/backup_compaction_iterator.go line 76 at r2 (raw file):

}

// advance moves past keys with timestamps later than f.asOf

nit: missing period.


pkg/storage/backup_compaction_iterator.go line 136 at r2 (raw file):

func (f *BackupCompactionIterator) assertInvariants() error {
	if err := assertSimpleMVCCIteratorInvariants(f); err != nil {

never mind my earlier comment -- I suppose reusing this function is why this implements SimpleMVCCIterator.
Those panics in the earlier methods don't trip up this invariant checking because all those Range* methods are only called when HasPointAndRange returns hasRange=true. As a nit, I am not sure that is properly documented as requirements in SimpleMVCCIterator, so consider strengthening the commentary in SimpleMVCCIterator.


pkg/storage/backup_compaction_iterator.go line 140 at r2 (raw file):

	}

	if f.asOf.IsEmpty() {

This assertion belongs in NewBackupCompactionIterator.


pkg/storage/backup_compaction_iterator.go line 146 at r2 (raw file):

	if ok, err := f.iter.Valid(); !ok || err != nil {
		errMsg := err.Error()
		return errors.AssertionFailedf("invalid underlying iter with err=%s", errMsg)

iterators can have errors because of underlying detected corruption. Such errors are not assertion failures. If the iterator did not detect an error (which happens via block checksumming etc.), we may still want to check application level (CRDB being the application) errors or certain invariants that should never be violated. Those should be wrapped in AssertionFailedf. If you look at assertSimpleMVCCIteratorInvariants that's roughly the pattern that is followed in examples like:

	// Keys can't be empty.
	if len(key.Key) == 0 {
		return errors.AssertionFailedf("valid iterator returned empty key")
	}
		value, err := iter.UnsafeValue()
		if err != nil {
			return err
		}

Copy link
Contributor Author

@kev-cao kev-cao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @msbutler, and @sumeerbhola)


pkg/storage/backup_compaction_iterator.go line 136 at r2 (raw file):

Previously, sumeerbhola wrote…

never mind my earlier comment -- I suppose reusing this function is why this implements SimpleMVCCIterator.
Those panics in the earlier methods don't trip up this invariant checking because all those Range* methods are only called when HasPointAndRange returns hasRange=true. As a nit, I am not sure that is properly documented as requirements in SimpleMVCCIterator, so consider strengthening the commentary in SimpleMVCCIterator.

To provide some context on this work, feel free to check out the prototype. Another reason for re-using SimpleMVCCIterator is because we will update the iterator in MergedSST to a generic type so that it can be used in both restores as well as compactions.

Regarding the requirement on hasRange=true, it looks like our existing underlying iterators simply return zero values if the range does not exist. @msbutler I did notice that ReadAsOfIterator also does the same, to preserve the behavior perhaps we should remove the panics.


pkg/storage/backup_compaction_iterator.go line 140 at r2 (raw file):

Previously, sumeerbhola wrote…

This assertion belongs in NewBackupCompactionIterator.

NewBackupCompactionIterator sets asOf to the MaxTimestamp if not provided (same behavior as the ReadAsOfIterator)


pkg/storage/backup_compaction_iterator.go line 146 at r2 (raw file):

Previously, sumeerbhola wrote…

iterators can have errors because of underlying detected corruption. Such errors are not assertion failures. If the iterator did not detect an error (which happens via block checksumming etc.), we may still want to check application level (CRDB being the application) errors or certain invariants that should never be violated. Those should be wrapped in AssertionFailedf. If you look at assertSimpleMVCCIteratorInvariants that's roughly the pattern that is followed in examples like:

	// Keys can't be empty.
	if len(key.Key) == 0 {
		return errors.AssertionFailedf("valid iterator returned empty key")
	}
		value, err := iter.UnsafeValue()
		if err != nil {
			return err
		}

From the looks of it, assertSimpleMVCCIteratorInvariants already wraps errors that should be considered assertion errors as assertion errors, so I suppose just returning the error as is is sufficient.

Copy link
Contributor Author

@kev-cao kev-cao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @msbutler, and @sumeerbhola)


pkg/storage/backup_compaction_iterator.go line 146 at r2 (raw file):

Previously, kev-cao (Kevin Cao) wrote…

From the looks of it, assertSimpleMVCCIteratorInvariants already wraps errors that should be considered assertion errors as assertion errors, so I suppose just returning the error as is is sufficient.

Oh whoops was reading the wrong section — in this case, f.iter.Valid() is being checked if f.valid is true, I think that's a fine assertion to make. Also the assertSimpleMVCCIteratorInvariants call at the top handles those assertions. I'm not quite sure what you change you are proposing here, could you elaborate?

@kev-cao kev-cao force-pushed the backupccl/backup-compaction-iterator branch from 4e49590 to 72ab307 Compare December 18, 2024 19:36
@msbutler
Copy link
Collaborator

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the context, why are there no range keys (MVCC tombstones) handled here? Is this related to the idea that once a range tombstone is applied, no point keys will ever be written on top (an invariant maintained by a higher layer), so the BackupCompactionIterator.iter is already configured to apply the effect of the range tombstone but not surface it?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @msbutler)

@msbutler
Copy link
Collaborator

msbutler commented Dec 19, 2024

Regarding the context, why are there no range keys (MVCC tombstones) handled here?

At least for the MVP of this backup compaction feature, we are assuming that rangekeys will not get written to the key space that we need to back up. Jeff is currently exploring the performance implications of rolling back imports without range tombstones, for example (design doc coming soon).

To deal with backups of whole tenants, this compaction iterator will need to handle range keys, but we'd like to address them after we build the backup compaction feature for all other backups first.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @msbutler)


pkg/storage/backup_compaction_iterator.go line 136 at r2 (raw file):

Previously, kev-cao (Kevin Cao) wrote…

To provide some context on this work, feel free to check out the prototype. Another reason for re-using SimpleMVCCIterator is because we will update the iterator in MergedSST to a generic type so that it can be used in both restores as well as compactions.

Regarding the requirement on hasRange=true, it looks like our existing underlying iterators simply return zero values if the range does not exist. @msbutler I did notice that ReadAsOfIterator also does the same, to preserve the behavior perhaps we should remove the panics.

The change to remove the panics is good enough -- no need to update the SimpleMVCCIterator interface.


pkg/storage/backup_compaction_iterator.go line 140 at r2 (raw file):

Previously, kev-cao (Kevin Cao) wrote…

NewBackupCompactionIterator sets asOf to the MaxTimestamp if not provided (same behavior as the ReadAsOfIterator)

Sure, but I am not clear on why an assertion about an immutable field of f belongs in a method that will be called frequently. If the concern is that someone will not use NewBackupCompactionIterator and just initialize the fields, perhaps hide this in a different package (though I don't think we need to be that paranoid).


pkg/storage/backup_compaction_iterator.go line 146 at r2 (raw file):

in this case, f.iter.Valid() is being checked if f.valid is true, I think that's a fine assertion to make.

I missed that. Can you add a code comment stating this here, and add something like the following at the top of the method:

// REQUIRES: f.valid
func (f *BackupCompactionIterator) assertInvariants() error {

Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @kev-cao, and @sumeerbhola)


pkg/storage/backup_compaction_iterator.go line 140 at r2 (raw file):

Previously, sumeerbhola wrote…

Sure, but I am not clear on why an assertion about an immutable field of f belongs in a method that will be called frequently. If the concern is that someone will not use NewBackupCompactionIterator and just initialize the fields, perhaps hide this in a different package (though I don't think we need to be that paranoid).

+1 to Sumeer's point. It seems like NewBackupCompactionIterator should simply return an error if the user passes a nil asOf.

@kev-cao kev-cao force-pushed the backupccl/backup-compaction-iterator branch from 72ab307 to ffd1d47 Compare December 23, 2024 21:13
@kev-cao kev-cao force-pushed the backupccl/backup-compaction-iterator branch 3 times, most recently from c2a68e0 to 7ce916e Compare December 24, 2024 01:24
For the purposes of backup compaction, a custom iterator is required
that behaves similarly to the ReadAsOfIterator, but also surfaces live
tombstones point keys.

Epic: none

Release note: None
@kev-cao kev-cao force-pushed the backupccl/backup-compaction-iterator branch from 7ce916e to 16f1fe9 Compare December 24, 2024 01:33
@kev-cao
Copy link
Contributor Author

kev-cao commented Dec 24, 2024

TFTR!

bors r=msbutler

@craig
Copy link
Contributor

craig bot commented Dec 24, 2024

👎 Rejected by code reviews

@msbutler msbutler dismissed sumeerbhola’s stale review December 25, 2024 01:04

dm'd that changes are not blocking

@msbutler
Copy link
Collaborator

bors r+

@craig craig bot merged commit 270ba1e into cockroachdb:master Dec 25, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants