Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize backup verification #13292

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mszeszko-meta
Copy link
Contributor

@mszeszko-meta mszeszko-meta commented Jan 11, 2025

Summary

Today, backup verification is serial, which could pose a challenge in rare, high urgency recovery scenarios where we want to timely assess whether candidate backup is not corrupted and eligible for the restore. The timely part will become increasingly more important in case of disaggregated storage.

Semantics

Given the very simple thread pool implementation in backup_engine today, we do not really have a control over initialized threads and consequently do not have an option to unschedule / cancel in-progress tasks. As a result, VerifyBackup won't bail out on a very first mismatch (as it was the case for serial implementation) and instead will iterate over all the files logging success / degree_of_failure for each. We could, in theory, not .wait() on remaining std::future<WorkItem>s (upon previously detected failure) and therefore decrease the observed API latency, but that could cause more confusion down the road as verification threads would still be occupied with inflight/scheduled work and would not be reclaimed by the pool for a while. It's a tradeoff where we choose a solution with clear and intuitive semantics.

Test plan

Make sure existing test collateral for backup verification in BackupEngineTest (VerifyBackup,CorruptFileMaintainSize,CorruptBlobFileMaintainSize, TableFileCorruptionBeforeIncremental, RateLimitingVerifyBackup, RateLimitingWithLowRefillBytesPerPeriod,Concurrency) runs in db_stress.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants