Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXC434 PXC-436 Crash in galera at certification #34

Open
wants to merge 3 commits into
base: 3.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions galera/src/replicator_smm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1838,10 +1838,24 @@ wsrep_status_t galera::ReplicatorSMM::cert(TrxHandle* trx)
case Certification::TEST_FAILED:
if (gu_unlikely(trx->is_toi() && applicable)) // small sanity check
{
// may happen on configuration change
log_warn << "Certification failed for TO isolated action: "
<< *trx;
assert(0);
// In some rare scenarios (e.g., when we have multiple
// transactions awaiting certification, and the last
// node remaining in the cluster becomes PRIMARY due
// to the failure of the previous primary node and
// the assign_initial_position() was called), sequence
// number mismatch occurs on configuration change and
// then certification was failed. We cannot move server
// forward (with last_seen_seqno < initial_position,
// see galera::Certification::do_test() for details)
// to avoid potential data loss, and hence will have
// to shut it down. Before shutting it down, we need
// to mark state as unsafe � to trigger SST at next
// server restart:
log_fatal << "Certification failed for TO isolated action: "
<< *trx;
st_.mark_unsafe();
local_monitor_.leave(lo);
abort();
}
local_cert_failures_ += trx->is_local();
trx->set_state(TrxHandle::S_MUST_ABORT);
Expand Down
10 changes: 8 additions & 2 deletions galera/src/replicator_str.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,19 @@ ReplicatorSMM::state_transfer_required(const wsrep_view_info_t& view_info)
{
if (local_seqno > group_seqno)
{
close();
gu_throw_fatal
// Local state sequence number is greater than group
// sequence number: states diverged on SST. We cannot
// move server forward (with local_seqno > group_seqno)
// to avoid potential data loss, and hence will have
// to shut it down. User must to remove state file and
// then restart server, if he/she wish to continue:
log_fatal
<< "Local state seqno (" << local_seqno
<< ") is greater than group seqno (" <<group_seqno
<< "): states diverged. Aborting to avoid potential "
<< "data loss. Remove '" << state_file_
<< "' file and restart if you wish to continue.";
abort();
}

return (local_seqno != group_seqno);
Expand Down