Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reorg race condition that can cause rare crashes #409

Merged
merged 1 commit into from
Aug 23, 2022

Conversation

LarryRuane
Copy link
Collaborator

Fixes issue #408.

This bug was introduced by PR #393, which changed how txids are
determined. That PR changed each call to the zcash getblock call into a
pair of calls, the first to get the raw block data, the second to
retrieve the txids in the block. (Unfortunately, you can't get both in a
single getblock RPC.) But this ordering introduced a timing window in
which the block at the given height can change, if a reorg occurred
between the two calls.

This PR reorders the getblock calls, so that the first call gets the
transaction IDs, which also happens to return the block hash, so then
the second getblock call can specify the block hash, rather than the
height. This ensures that the two RPC calls return consistent data,
definitely the same block.

@LarryRuane LarryRuane added the bug Something isn't working label Jul 29, 2022
@LarryRuane LarryRuane self-assigned this Jul 29, 2022
Copy link
Collaborator

@defuse defuse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@LarryRuane
Copy link
Collaborator Author

LarryRuane commented Aug 18, 2022

Force-pushed to improve comment as a result of review comment (thanks, @daira), and also thanks for your review, @defuse

@LarryRuane
Copy link
Collaborator Author

Force-pushed to fix a comment typo

}
// Simulate that we're synced (caught up);
// Simulate that we're synced (caught up, latest block 380641);
// this should cause one 10s sleep (then retry).
Copy link
Contributor

@daira daira Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// this should cause one 10s sleep (then retry).
// this should cause one sleep (then retry).

The sleep interval isn't 10s.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and actually there isn't any sleep-retry in this case (I single-stepped in the debugger to make sure), so I removed that comment line. I think that comment was copy-pasted from elsewhere.

Copy link
Contributor

@daira daira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK with minor suggestion.

Fixes issue 408.

This bug was introduced by PR 393, which changed how txids are
determined. That PR changed each call to the zcash getblock call into a
pair of calls, the first to get the raw block data, the second to
retrieve the txids in the block. (Unfortunately, you can't get both in a
single getblock RPC.) But this ordering introduced a timing window in
which the block at the given height can change, if a reorg occurred
between the two calls.

This PR reorders the getblock calls, so that the first call gets the
transaction IDs, which also happens to return the block hash, so then
the second getblock call can specify the block hash, rather than the
height. This ensures that the two RPC calls return consistent data,
definitely the same block.
@LarryRuane
Copy link
Collaborator Author

Force-pushed to address @daira's review comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants