-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up leader search #320
Conversation
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
This makes it easier to inject and test connection attempt failures. Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Pushed new commits to update the interface following discussion with @marco6. Instead of attaching leader tracking to the For the For the We save slightly less work with the new design because |
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
First, avoid infinite recursion when logging in connectAttemptOne. Second, fix the tests by logging the error from Semaphore.Acquire. It seems that before we bumped the version of golang.org/x/sync/semaphore, this type didn't honor the context deadline, hence why the tests were fine for me locally without merging in the go.mod updates that are on master. Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
This PR contains two changes that are intended to address #300 by cutting the number of dqlite connections that need to be opened in typical cases.
The first change is to remember the address of the cluster leader whenever we connect to it successfully, and to try this remembered address first when opening a new leader connection. When the cluster is stable, this means that we open only one connection in
Connector.Connect
; previously we would open on the order ofN
(= cluster size) connections. The tradeoff is thatConnector.Connect
is now slower and less efficient when the cluster is not stable. This optimization applies to bothclient.FindLeader
and the driver implementation.The second change is to enable
client.FindLeader
to reuse an existing connection to the leader instead of opening a new one. This is valid because the operations that can be performed on the connection using the returnedClient
do not depend on the logical state of the connection (open database, prepared statements). When the leader is stable, this saves one new connection perclient.FindLeader
call after the first change has been implemented. The long-lived connection is checked for validity and leadership before returning it to be reused.Both of these changes rely on storing some new state in the
NodeStore
using some fun embedding tricks. I did it this way because theNodeStore
is the only object that is passed around to all the right places and lives long enough to persist state between connection attempts.On my machine, using @masnax's repro script, these two changes combined cut the spikes in CPU usage from 30% to 8-10%, with the first change being responsible for most of that improvement. The remaining spike is due to opening
N
connections (in parallel) withinmakeRolesChanges
, and could perhaps be dealt with by augmenting theNodeStore
further with a pool of connections to all nodes instead of just the last known leader, but I've left that for a possible follow-up.Signed-off-by: Cole Miller [email protected]