Fix reorg race condition that can cause rare crashes #409

LarryRuane · 2022-07-29T19:09:22Z

Fixes issue #408.

This bug was introduced by PR #393, which changed how txids are
determined. That PR changed each call to the zcash getblock call into a
pair of calls, the first to get the raw block data, the second to
retrieve the txids in the block. (Unfortunately, you can't get both in a
single getblock RPC.) But this ordering introduced a timing window in
which the block at the given height can change, if a reorg occurred
between the two calls.

This PR reorders the getblock calls, so that the first call gets the
transaction IDs, which also happens to return the block hash, so then
the second getblock call can specify the block hash, rather than the
height. This ensures that the two RPC calls return consistent data,
definitely the same block.

defuse

utACK

common/common.go

LarryRuane · 2022-08-18T16:02:12Z

Force-pushed to improve comment as a result of review comment (thanks, @daira), and also thanks for your review, @defuse

LarryRuane · 2022-08-23T14:18:40Z

Force-pushed to fix a comment typo

daira · 2022-08-23T14:35:15Z

common/common_test.go

 		}
-		// Simulate that we're synced (caught up);
+		// Simulate that we're synced (caught up, latest block 380641);
 		// this should cause one 10s sleep (then retry).


Suggested change

// this should cause one 10s sleep (then retry).

// this should cause one sleep (then retry).

The sleep interval isn't 10s.

Good catch, and actually there isn't any sleep-retry in this case (I single-stepped in the debugger to make sure), so I removed that comment line. I think that comment was copy-pasted from elsewhere.

daira

utACK with minor suggestion.

Fixes issue 408. This bug was introduced by PR 393, which changed how txids are determined. That PR changed each call to the zcash getblock call into a pair of calls, the first to get the raw block data, the second to retrieve the txids in the block. (Unfortunately, you can't get both in a single getblock RPC.) But this ordering introduced a timing window in which the block at the given height can change, if a reorg occurred between the two calls. This PR reorders the getblock calls, so that the first call gets the transaction IDs, which also happens to return the block hash, so then the second getblock call can specify the block hash, rather than the height. This ensures that the two RPC calls return consistent data, definitely the same block.

LarryRuane · 2022-08-23T15:36:51Z

Force-pushed to address @daira's review comments.

LarryRuane added the bug Something isn't working label Jul 29, 2022

LarryRuane requested review from pacu, softminus, defuse and r3ld3v July 29, 2022 19:09

LarryRuane self-assigned this Jul 29, 2022

LarryRuane mentioned this pull request Aug 1, 2022

Server resets its latest block from time to time #400

Closed

defuse approved these changes Aug 18, 2022

View reviewed changes

daira reviewed Aug 18, 2022

View reviewed changes

common/common.go Show resolved Hide resolved

daira reviewed Aug 18, 2022

View reviewed changes

common/common.go Show resolved Hide resolved

LarryRuane force-pushed the 2022-07-reorg-race branch from 91fae0d to 7afa136 Compare August 18, 2022 15:59

LarryRuane force-pushed the 2022-07-reorg-race branch from 7afa136 to 94a135e Compare August 23, 2022 14:18

daira reviewed Aug 23, 2022

View reviewed changes

daira approved these changes Aug 23, 2022

View reviewed changes

LarryRuane force-pushed the 2022-07-reorg-race branch from 94a135e to 1a556c3 Compare August 23, 2022 15:34

pacu removed their request for review August 23, 2022 19:06

LarryRuane merged commit f53511c into zcash:master Aug 23, 2022

LarryRuane deleted the 2022-07-reorg-race branch August 23, 2022 19:14

LarryRuane mentioned this pull request Aug 27, 2022

Reorg race condition that can cause rare crashes #408

Closed

defuse mentioned this pull request Feb 13, 2023

Missing the fix to the reorg race condition crash adityapk00/lightwalletd#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reorg race condition that can cause rare crashes #409

Fix reorg race condition that can cause rare crashes #409

LarryRuane commented Jul 29, 2022

defuse left a comment

LarryRuane commented Aug 18, 2022 •

edited

Loading

LarryRuane commented Aug 23, 2022

daira Aug 23, 2022 •

edited

Loading

LarryRuane Aug 23, 2022

daira left a comment

LarryRuane commented Aug 23, 2022

	// this should cause one 10s sleep (then retry).
	// this should cause one sleep (then retry).

Fix reorg race condition that can cause rare crashes #409

Fix reorg race condition that can cause rare crashes #409

Conversation

LarryRuane commented Jul 29, 2022

defuse left a comment

Choose a reason for hiding this comment

LarryRuane commented Aug 18, 2022 • edited Loading

LarryRuane commented Aug 23, 2022

daira Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

LarryRuane Aug 23, 2022

Choose a reason for hiding this comment

daira left a comment

Choose a reason for hiding this comment

LarryRuane commented Aug 23, 2022

LarryRuane commented Aug 18, 2022 •

edited

Loading

daira Aug 23, 2022 •

edited

Loading