Skip to content

Commit

Permalink
Fix an assertion failure in error handler (#13251)
Browse files Browse the repository at this point in the history
Summary:
we saw this [assertion](https://github.com/facebook/rocksdb/blob/02b4197544f758bdf84d80fe9319238611848c48/db/error_handler.cc#L576) failing in crash test. The LOG shows that there's a call to SetOptions() concurrent to ResumeImpl(). It's possible that while waiting for error recovery flush (with mutex released), SetOptions() failed to write to MANIFEST and added a file to be quarantined. This triggered the assertion failure when ResumeImpl() calls ClearBGError().

This PR fixes the issue by setting background error when SetOptions() fails to write to MANIFEST.

Pull Request resolved: #13251

Test Plan: monitor future crash test failures.

Reviewed By: hx235

Differential Revision: D67660106

Pulled By: cbi42

fbshipit-source-id: 1b52bb23005c4b544f8f9bceefd3b9dcbaf0edfa
  • Loading branch information
cbi42 authored and facebook-github-bot committed Jan 2, 2025
1 parent e48ccc2 commit 159fa77
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 1 deletion.
6 changes: 5 additions & 1 deletion db/db_impl/db_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1299,11 +1299,15 @@ Status DBImpl::SetOptions(
VersionEdit dummy_edit;
s = versions_->LogAndApply(cfd, new_options, read_options, write_options,
&dummy_edit, &mutex_, directories_.GetDbDir());
if (!versions_->io_status().ok()) {
assert(!s.ok());
error_handler_.SetBGError(versions_->io_status(),
BackgroundErrorReason::kManifestWrite);
}
// Trigger possible flush/compactions. This has to be before we persist
// options to file, otherwise there will be a deadlock with writer
// thread.
InstallSuperVersionAndScheduleWork(cfd, &sv_context, new_options);

persist_options_status =
WriteOptionsFile(write_options, true /*db_mutex_already_held*/);
bg_cv_.SignalAll();
Expand Down
2 changes: 2 additions & 0 deletions db/error_handler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,8 @@ Status ErrorHandler::ClearBGError() {

// Signal that recovery succeeded
if (recovery_error_.ok()) {
// If this assertion fails, it means likely bg error is not set after a
// file is quarantined during MANIFEST write.
assert(files_to_quarantine_.empty());
Status old_bg_error = bg_error_;
// old_bg_error is only for notifying listeners, so may not be checked
Expand Down
2 changes: 2 additions & 0 deletions db/version_set.cc
Original file line number Diff line number Diff line change
Expand Up @@ -5939,6 +5939,8 @@ Status VersionSet::LogAndApply(
}
TEST_SYNC_POINT_CALLBACK("VersionSet::LogAndApply:WakeUpAndDone", mu);
#endif /* !NDEBUG */
// FIXME: One MANIFEST write failure can cause all writes to SetBGError,
// should only SetBGError once.
return first_writer.status;
}

Expand Down

0 comments on commit 159fa77

Please sign in to comment.