You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems as soon as each cluster member begins regularly querying to view the leader info, the CPU usage of go-dqlite balloons.
This was noticed in microcluster where we have been forced to set our cluster heartbeats to fire every 30 seconds to avoid excessive CPU usage.
The plaintext implementation appears fine, but when starting with WithExternalConn or WithTLS, the service appears to become very inefficient.
Take the following example using WithTLS in the dqlite-demo example daemon. The only change is adding the following loop to each peer to query the dqlite leader every 3 seconds:
// Following the call to http.Serve:gofunc() {
// initially sleep to give the daemon time to set up.time.Sleep(5*time.Second)
for {
// get a client to the leader.c, err:=app.Leader(context.Background())
iferr!=nil {
panic(err)
}
// fetch the leader's node info.c.Leader(context.Background())
// sleep 3s before trying again.time.Sleep(3*time.Second)
}
}()
Meanwhile top zig-zags between 0 and 10% cpu usage at each "heartbeat" interval:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
163326 root 20 0 2813608 33852 10752 S 9.0 0.1 0:08.66 dqlite-demo
For comparison, I put the same 3-second dqlite leader check to run continually on every node of a 3-member LXD cluster, and oddly LXD still uses less CPU with comparable context switches, and never goes above 2% CPU usage. This is on top of LXD's already-present continual cluster heartbeats. It's worth noting LXD does not use the main dqlite.App implementation.
Just to add, an easier reproducer instead of adding the goroutine mentioned in the issue would be to set a lower role adjustment frequency like app.WithRolesAdjustmentFrequency(time.Second * 5)
When deploying a dqlite cluster with TLS and RolesAdjustmentFrequency set to 5s, and WithMaxConcurrentLeaderConns set to 1, I see the following CPU spikes with top, although they are very brief, they occur every couple seconds before the CPU returns to idle:
top -d 0.5 -p $(pgrep -f "dqlite-demo.*8001")
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.2 us, 0.2 sy, 0.0 ni, 98.4 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 32078.2 total, 23412.6 free, 2042.6 used, 6623.1 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 28138.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21330 root 20 0 2666584 33396 10752 S 30.0 0.1 0:21.51 dqlite-demo
It seems as soon as each cluster member begins regularly querying to view the leader info, the CPU usage of
go-dqlite
balloons.This was noticed in microcluster where we have been forced to set our cluster heartbeats to fire every 30 seconds to avoid excessive CPU usage.
The plaintext implementation appears fine, but when starting with
WithExternalConn
orWithTLS
, the service appears to become very inefficient.Take the following example using
WithTLS
in thedqlite-demo
example daemon. The only change is adding the following loop to each peer to query the dqlite leader every 3 seconds:Now launch 3 dqlite-demo peers with TLS enabled:
Finally, see the result:
Meanwhile
top
zig-zags between 0 and 10% cpu usage at each "heartbeat" interval:For comparison, I put the same 3-second dqlite leader check to run continually on every node of a 3-member LXD cluster, and oddly LXD still uses less CPU with comparable context switches, and never goes above 2% CPU usage. This is on top of LXD's already-present continual cluster heartbeats. It's worth noting LXD does not use the main
dqlite.App
implementation.The text was updated successfully, but these errors were encountered: