Speed up leader search #320

cole-miller · 2024-09-24T04:00:31Z

This PR contains two changes that are intended to address #300 by cutting the number of dqlite connections that need to be opened in typical cases.

The first change is to remember the address of the cluster leader whenever we connect to it successfully, and to try this remembered address first when opening a new leader connection. When the cluster is stable, this means that we open only one connection in Connector.Connect; previously we would open on the order of N (= cluster size) connections. The tradeoff is that Connector.Connect is now slower and less efficient when the cluster is not stable. This optimization applies to both client.FindLeader and the driver implementation.

The second change is to enable client.FindLeader to reuse an existing connection to the leader instead of opening a new one. This is valid because the operations that can be performed on the connection using the returned Client do not depend on the logical state of the connection (open database, prepared statements). When the leader is stable, this saves one new connection per client.FindLeader call after the first change has been implemented. The long-lived connection is checked for validity and leadership before returning it to be reused.

Both of these changes rely on storing some new state in the NodeStore using some fun embedding tricks. I did it this way because the NodeStore is the only object that is passed around to all the right places and lives long enough to persist state between connection attempts.

On my machine, using @masnax's repro script, these two changes combined cut the spikes in CPU usage from 30% to 8-10%, with the first change being responsible for most of that improvement. The remaining spike is due to opening N connections (in parallel) within makeRolesChanges, and could perhaps be dealt with by augmenting the NodeStore further with a pool of connections to all nodes instead of just the last known leader, but I've left that for a possible follow-up.

Signed-off-by: Cole Miller [email protected]

Signed-off-by: Cole Miller <[email protected]>

This makes it easier to inject and test connection attempt failures. Signed-off-by: Cole Miller <[email protected]>

Signed-off-by: Cole Miller <[email protected]>

just-now

+1

client/leader_test.go

internal/protocol/connector_test.go

Signed-off-by: Cole Miller <[email protected]>

cole-miller · 2024-10-10T23:55:44Z

Pushed new commits to update the interface following discussion with @marco6. Instead of attaching leader tracking to the NodeStore, now we attach it to protocol.Connector. This is a more natural separation of concerns and cleans things up nicely within the protocol package. It requires some changes to client and driver to take advantage of leader tracking, since previously these packages would just create one-off protocol.Connectors instead of reusing them.

For the driver package, the speedup from leader tracking remains transparent---we store a *protocol.Connector in the driver.Connector, and the database/sql connection pool will internally reuse a driver.Connector many times.

For the client package, there's nowhere in the signature of FindLeader that we could smuggle in a persistent *protocol.Connector, so I've just introduced a new LeaderConnector type that takes care of this. App has been updated to keep a LeaderConnector around.

We save slightly less work with the new design because client and driver can't share a LeaderTracker (although in the future I'd love to turn LeaderConnector into a place for sharing common functionality between client and driver, instead of duplicating options).

Signed-off-by: Cole Miller <[email protected]>

First, avoid infinite recursion when logging in connectAttemptOne. Second, fix the tests by logging the error from Semaphore.Acquire. It seems that before we bumped the version of golang.org/x/sync/semaphore, this type didn't honor the context deadline, hence why the tests were fine for me locally without merging in the go.mod updates that are on master. Signed-off-by: Cole Miller <[email protected]>

Signed-off-by: Cole Miller <[email protected]>

cole-miller added 12 commits September 23, 2024 23:28

Remember the address of the last known leader

55d1495

Signed-off-by: Cole Miller <[email protected]>

Introduce a reusable leader connection

3bd3a69

Signed-off-by: Cole Miller <[email protected]>

Always try to connect at least once

0aa753b

This makes it easier to inject and test connection attempt failures. Signed-off-by: Cole Miller <[email protected]>

Move leader tracking information to Protocol

9db8e48

Signed-off-by: Cole Miller <[email protected]>

Refactor generation of retry strategies

87297e8

Signed-off-by: Cole Miller <[email protected]>

More comprehensive/principled leader tracker testing

1b43389

Signed-off-by: Cole Miller <[email protected]>

Client-level test of connection reuse

33bf955

Signed-off-by: Cole Miller <[email protected]>

Names, docs

4d3510c

Signed-off-by: Cole Miller <[email protected]>

Don't remain client field

021b0b1

Signed-off-by: Cole Miller <[email protected]>

Clean up change to dqlite-demo.go

69140bc

Signed-off-by: Cole Miller <[email protected]>

Fixup

ca72e1a

Signed-off-by: Cole Miller <[email protected]>

Fixup

cf404cc

Signed-off-by: Cole Miller <[email protected]>

cole-miller marked this pull request as ready for review October 1, 2024 02:48

Clean up diff a bit more

a914e16

Signed-off-by: Cole Miller <[email protected]>

just-now approved these changes Oct 10, 2024

View reviewed changes

client/leader_test.go Outdated Show resolved Hide resolved

internal/protocol/connector_test.go Show resolved Hide resolved

cole-miller added 2 commits October 10, 2024 19:42

Move leader tracking to protocol.Connector

71628fa

Signed-off-by: Cole Miller <[email protected]>

Fix up tests

ff43520

Signed-off-by: Cole Miller <[email protected]>

cole-miller added 8 commits October 10, 2024 20:16

Take advantage of leader tracking for App

0da97bb

Signed-off-by: Cole Miller <[email protected]>

Docs

30922ad

Signed-off-by: Cole Miller <[email protected]>

Drop embedded LeaderTracker

b3c99ba

Signed-off-by: Cole Miller <[email protected]>

Unified client connector

162c3e5

Signed-off-by: Cole Miller <[email protected]>

Don't expose shared leader connections for now

b09c87b

Signed-off-by: Cole Miller <[email protected]>

Document client connector

58fc6e5

Signed-off-by: Cole Miller <[email protected]>

Some more docs

cf3c248

Signed-off-by: Cole Miller <[email protected]>

cole-miller merged commit 6702ded into canonical:master Oct 18, 2024
9 checks passed

masnax mentioned this pull request Nov 7, 2024

dqlite: Use go-dqlite LTS and build libdqlite from LTS branch canonical/microcluster#272

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up leader search #320

Speed up leader search #320

cole-miller commented Sep 24, 2024

just-now left a comment

cole-miller commented Oct 10, 2024

Speed up leader search #320

Speed up leader search #320

Conversation

cole-miller commented Sep 24, 2024

just-now left a comment

Choose a reason for hiding this comment

cole-miller commented Oct 10, 2024