Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement traversal based early termination #3337

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

yacovm
Copy link
Contributor

@yacovm yacovm commented Aug 27, 2024

This commit makes the early termination logic of the snowflake poll consider also transitive voting. A block b0 is transitively voted by all votes for block b1 if b0 is an ancestor of b1.

This transitive termination logic also works for shard common block prefixes. If block b0 is a direct ancestor of b1 and b2 and the latter have a shared prefix, then the logic in this commit may not early terminate the vote if a future vote might improve the confidence of the shared prefix, as it correspons to a snowflake instance.

Note: I will rename the file name early_term_no_traversal.go after this is merged. I didn't rename them in the same commit so that it will be easy to distinguish between old code and new code.

Why this should be merged

Considering transitive votes in early termination logic improves consensus performance by reducing the time to block finalization due to earlier termination than when not taking into account transitive voting.

How this works

When the early termination logic is asked whether to terminate or not, it creates a graph out of the votes, where each ID of a block corresponds to a vertex. Parent vertices point to their descendants and descendants to their parents.
When considering how many votes for a block ID exist, the direct vote as well as the votes for the descendants are now taken int account.

Additionally, in order to adhere to the same logic that snowman employs for the bit decomposition of blocks, the termination logic also takes into account votes for shared bit prefixes of block IDs. For each vertex, its direct descendants are observed, and a graph of shared prefixes is built. Every inner vertex in the graph that corresponds to a unary node that represents a shared prefix is also taken account in transitive voting.

How this was tested

Unit tests for added code are included.

I ran a modified build on mainnet for 24 hours alongside a regular build and observed that transitive voting reduces occurrence of consensus rounds lasting more than 1200 milliseconds from 14% to 1.5% on mainnet.

The following graphs show the latency of consensus rounds as a function of time on mainnet.
Without transitive voting:

Screenshot 2024-08-26 at 16 56 52

With transitive voting:
Screenshot 2024-08-26 at 17 15 43

@yacovm yacovm self-assigned this Aug 27, 2024
@yacovm yacovm force-pushed the earlyterm_yes_traversal branch 3 times, most recently from ccfaac5 to 7b6f256 Compare August 27, 2024 16:37
@yacovm yacovm added the consensus This involves consensus label Aug 27, 2024
@michaelkaplan13 michaelkaplan13 requested review from aaronbuchwald, marun, joshua-kim and tsachiherman and removed request for marun September 24, 2024 16:13
Copy link
Contributor

@marun marun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • How confident are you that the limited mainnet testing performed thus far represents the expected behavior when the feature is deployed to a majority of nodes?
  • The description only mentions the advantages of this change. Any downsides or tradeoffs? Presumably there is more computational cost, but hopefully that would be a worthy tradeoff.

snow/consensus/snowman/poll/early_term_no_traversal.go Outdated Show resolved Hide resolved
snow/consensus/snowman/poll/early_term_no_traversal.go Outdated Show resolved Hide resolved
snow/consensus/snowman/poll/graph.go Show resolved Hide resolved
snow/consensus/snowman/poll/graph.go Outdated Show resolved Hide resolved
snow/consensus/snowman/poll/graph.go Show resolved Hide resolved
snow/consensus/snowman/poll/graph_test.go Outdated Show resolved Hide resolved
snow/consensus/snowman/poll/graph_test.go Outdated Show resolved Hide resolved
snow/consensus/snowman/poll/prefix.go Show resolved Hide resolved
@yacovm
Copy link
Contributor Author

yacovm commented Sep 27, 2024

  • How confident are you that the limited mainnet testing performed thus far represents the expected behavior when the feature is deployed to a majority of nodes?

The unit tests I added represent much more extreme cases than what happens on mainnet. On mainnet, the block depth this entire code operates on, is either 1 or 2 blocks deep. So in that aspect I'm confident.

However before we deploy to mainnet, we will deploy to testnet, and the testnet behavior is not that different than mainnet, from what I've seen.

  • The description only mentions the advantages of this change. Any downsides or tradeoffs? Presumably there is more computational cost, but hopefully that would be a worthy tradeoff.

The rest of the snowman code already performs transitive voting. I'm only completing the last piece that is needed. Without this code, we terminate the poll longer than necessary and needlessly slow down consensus. So even if there are downsides, it makes no sense to withhold this, as it was always the intent to implement this.

chains/manager.go Show resolved Hide resolved
chains/manager.go Show resolved Hide resolved
snow/context.go Outdated
Comment on lines 99 to 101
type BlockTraversal interface {
GetParent(id ids.ID) ids.ID
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this fits here, but also not sure if there's a better place to put it since we need to supply an implementation from the snowman engine to the snowman consensus instance

@@ -16,7 +16,7 @@ import (
// Config wraps all the parameters needed for a snowman engine
type Config struct {
common.AllGetsServer

BlockTraversal snow.BlockTraversal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set this field in TestBootstrapPartiallyAccepted as well to make sure the new field is populated everywhere it's used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. Since I don't set this field, and the test passes, why set it if it's apparently not tested?

Comment on lines 683 to 686
if block.blk == nil {
return ids.Empty
}
return block.blk.Parent()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip block.blk == nil? We should have the invariant that block.blk is never nil since we only assign to it in one place and always with a value that we've already called Parent() on

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we switch from using ids.Empty to signify that we don't have the block to returning an extra boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip block.blk == nil? We should have the invariant that block.blk is never nil since we only assign to it in one place and always with a value that we've already called Parent() on

We have the invariant, however, what if we won't have it? I think it's better to be safe than sorry and just keep this check for safety to avoid crashing the node.

Could we switch from using ids.Empty to signify that we don't have the block to returning an extra boolean?

Yes I switched.

return descendant
}

// prefixGroup represents a bunch of IDs (stored in the members field),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the type definition be at the top of the file rather than here? Reading through this code I see a new type in the first function and jump to here to see what it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, moved it to the top of the file.

Comment on lines 100 to 101
// bifurcationsWithCommonPrefix invokes f() on this and descendant prefixGroups
// which represent common prefixes and not an ID in its entirety.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this comment match with the actual condition that's enforced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

longestSharedPrefixes returns a tree where each vertex is either the head, has a bifurcation, or is a leaf.

len(prefixGroup.prefix) > 0 filters out the leaves.
if prefixGroup.isBifurcation() filters out the head.

return originPG
}

func determineDescendant(pg *prefixGroup) *prefixGroup {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to remove this function and handle this within a switch statement that handles the existing case pg0 != nil && pg1 != nil and then handles the case that either one of them is nil to determine the descendant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be cleaner code wise, but I want to emphasize the different semantics here - the case where both are not nil is a bifurcation, and the complement of that is that we extend the current prefix as you correctly pointed out. These two cases are not the same and therefore to make it clearer in the code, I did not put them in a single switch statement.

If you feel strongly about it, I can collapse them in a switch statement, but I thought it would be more clear to the reader this way.

snow/consensus/snowman/poll/prefix.go Show resolved Hide resolved
Comment on lines 16 to 20
type parentGetter func(id ids.ID) ids.ID

func (p parentGetter) GetParent(id ids.ID) ids.ID {
return p(id)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I had not seen a function turned directly into a single function interface this way before, nice

Comment on lines 28 to 30
func returnSelfID(id ids.ID) ids.ID {
return id
}
Copy link
Collaborator

@aaronbuchwald aaronbuchwald Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the current handling of ids.Empty would it make more sense to returns ids.Empty as a default value rather than returning self as a parent, which could result in a loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@yacovm yacovm force-pushed the earlyterm_yes_traversal branch 6 times, most recently from b94bb26 to ea2813c Compare October 23, 2024 22:59
This commit makes the early termination logic of the snowflake poll consider also transitive voting.
A block `b0` is transitively voted by all votes for block `b1` if `b0` is an ancestor of `b1`.

This transitive termination logic also works for shard common block prefixes.
If block `b0` is a direct ancestor of `b1` and `b2` and the latter have a shared prefix,
then the logic in this commit may not early terminate the vote if a future vote might improve the confidence
of the shared prefix, as it correspons to a snowflake instance.

Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus This involves consensus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants