-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement traversal based early termination #3337
base: master
Are you sure you want to change the base?
Conversation
ccfaac5
to
7b6f256
Compare
ff5a4d6
to
68e69aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How confident are you that the limited mainnet testing performed thus far represents the expected behavior when the feature is deployed to a majority of nodes?
- The description only mentions the advantages of this change. Any downsides or tradeoffs? Presumably there is more computational cost, but hopefully that would be a worthy tradeoff.
The unit tests I added represent much more extreme cases than what happens on mainnet. On mainnet, the block depth this entire code operates on, is either 1 or 2 blocks deep. So in that aspect I'm confident. However before we deploy to mainnet, we will deploy to testnet, and the testnet behavior is not that different than mainnet, from what I've seen.
The rest of the snowman code already performs transitive voting. I'm only completing the last piece that is needed. Without this code, we terminate the poll longer than necessary and needlessly slow down consensus. So even if there are downsides, it makes no sense to withhold this, as it was always the intent to implement this. |
70fb2dc
to
6ff8140
Compare
snow/context.go
Outdated
type BlockTraversal interface { | ||
GetParent(id ids.ID) ids.ID | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this fits here, but also not sure if there's a better place to put it since we need to supply an implementation from the snowman engine to the snowman consensus instance
@@ -16,7 +16,7 @@ import ( | |||
// Config wraps all the parameters needed for a snowman engine | |||
type Config struct { | |||
common.AllGetsServer | |||
|
|||
BlockTraversal snow.BlockTraversal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set this field in TestBootstrapPartiallyAccepted
as well to make sure the new field is populated everywhere it's used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. Since I don't set this field, and the test passes, why set it if it's apparently not tested?
if block.blk == nil { | ||
return ids.Empty | ||
} | ||
return block.blk.Parent() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we skip block.blk == nil
? We should have the invariant that block.blk
is never nil
since we only assign to it in one place and always with a value that we've already called Parent()
on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we switch from using ids.Empty
to signify that we don't have the block to returning an extra boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we skip block.blk == nil? We should have the invariant that block.blk is never nil since we only assign to it in one place and always with a value that we've already called Parent() on
We have the invariant, however, what if we won't have it? I think it's better to be safe than sorry and just keep this check for safety to avoid crashing the node.
Could we switch from using ids.Empty to signify that we don't have the block to returning an extra boolean?
Yes I switched.
return descendant | ||
} | ||
|
||
// prefixGroup represents a bunch of IDs (stored in the members field), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the type definition be at the top of the file rather than here? Reading through this code I see a new type in the first function and jump to here to see what it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, moved it to the top of the file.
// bifurcationsWithCommonPrefix invokes f() on this and descendant prefixGroups | ||
// which represent common prefixes and not an ID in its entirety. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this comment match with the actual condition that's enforced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
longestSharedPrefixes
returns a tree where each vertex is either the head, has a bifurcation, or is a leaf.
len(prefixGroup.prefix) > 0
filters out the leaves.
if prefixGroup.isBifurcation()
filters out the head.
return originPG | ||
} | ||
|
||
func determineDescendant(pg *prefixGroup) *prefixGroup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner to remove this function and handle this within a switch statement that handles the existing case pg0 != nil && pg1 != nil
and then handles the case that either one of them is nil to determine the descendant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be cleaner code wise, but I want to emphasize the different semantics here - the case where both are not nil is a bifurcation, and the complement of that is that we extend the current prefix as you correctly pointed out. These two cases are not the same and therefore to make it clearer in the code, I did not put them in a single switch statement.
If you feel strongly about it, I can collapse them in a switch statement, but I thought it would be more clear to the reader this way.
type parentGetter func(id ids.ID) ids.ID | ||
|
||
func (p parentGetter) GetParent(id ids.ID) ids.ID { | ||
return p(id) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I had not seen a function turned directly into a single function interface this way before, nice
func returnSelfID(id ids.ID) ids.ID { | ||
return id | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the current handling of ids.Empty
would it make more sense to returns ids.Empty
as a default value rather than returning self as a parent, which could result in a loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
b94bb26
to
ea2813c
Compare
This commit makes the early termination logic of the snowflake poll consider also transitive voting. A block `b0` is transitively voted by all votes for block `b1` if `b0` is an ancestor of `b1`. This transitive termination logic also works for shard common block prefixes. If block `b0` is a direct ancestor of `b1` and `b2` and the latter have a shared prefix, then the logic in this commit may not early terminate the vote if a future vote might improve the confidence of the shared prefix, as it correspons to a snowflake instance. Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
87784e6
to
99deea8
Compare
Signed-off-by: Yacov Manevich <[email protected]>
This commit makes the early termination logic of the snowflake poll consider also transitive voting. A block
b0
is transitively voted by all votes for blockb1
ifb0
is an ancestor ofb1
.This transitive termination logic also works for shard common block prefixes. If block
b0
is a direct ancestor ofb1
andb2
and the latter have a shared prefix, then the logic in this commit may not early terminate the vote if a future vote might improve the confidence of the shared prefix, as it correspons to a snowflake instance.Note: I will rename the file name
early_term_no_traversal.go
after this is merged. I didn't rename them in the same commit so that it will be easy to distinguish between old code and new code.Why this should be merged
Considering transitive votes in early termination logic improves consensus performance by reducing the time to block finalization due to earlier termination than when not taking into account transitive voting.
How this works
When the early termination logic is asked whether to terminate or not, it creates a graph out of the votes, where each ID of a block corresponds to a vertex. Parent vertices point to their descendants and descendants to their parents.
When considering how many votes for a block ID exist, the direct vote as well as the votes for the descendants are now taken int account.
Additionally, in order to adhere to the same logic that snowman employs for the bit decomposition of blocks, the termination logic also takes into account votes for shared bit prefixes of block IDs. For each vertex, its direct descendants are observed, and a graph of shared prefixes is built. Every inner vertex in the graph that corresponds to a unary node that represents a shared prefix is also taken account in transitive voting.
How this was tested
Unit tests for added code are included.
I ran a modified build on mainnet for 24 hours alongside a regular build and observed that transitive voting reduces occurrence of consensus rounds lasting more than 1200 milliseconds from 14% to 1.5% on mainnet.
The following graphs show the latency of consensus rounds as a function of time on mainnet.
Without transitive voting:
With transitive voting: