Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Today, backup verification is serial, which could pose a challenge in rare, high urgency recovery scenarios where we want to timely assess whether candidate backup is not corrupted and eligible for the restore. The timely part will become increasingly more important in case of disaggregated storage.
Semantics
Given the very simple thread pool implementation in
backup_engine
today, we do not really have a control over initialized threads and consequently do not have an option to unschedule / cancel in-progress tasks. As a result,VerifyBackup
won't bail out on a very first mismatch (as it was the case for serial implementation) and instead will iterate over all the files logging success / degree_of_failure for each. We could, in theory, not.wait()
on remainingstd::future<WorkItem>
s (upon previously detected failure) and therefore decrease the observed API latency, but that could cause more confusion down the road as verification threads would still be occupied with inflight/scheduled work and would not be reclaimed by the pool for a while. It's a tradeoff where we choose a solution with clear and intuitive semantics.Test plan
Make sure existing test collateral for backup verification in
BackupEngineTest
(VerifyBackup,CorruptFileMaintainSize,CorruptBlobFileMaintainSize, TableFileCorruptionBeforeIncremental, RateLimitingVerifyBackup, RateLimitingWithLowRefillBytesPerPeriod,Concurrency
) runs indb_stress
.