Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR couldn't connect to zsys daemon: timed out waiting for server handshake #193

Open
farcaller opened this issue Feb 17, 2021 · 8 comments

Comments

@farcaller
Copy link

Describe the bug

zsysctl commands fail, e.g.

# zsysctl show
ERROR couldn't connect to zsys daemon: timed out waiting for server handshake

Interestingly enough, after the server is "primed" by e.g. grpcurl, zsysctl seems to work:

# zsysctl list
ERROR couldn't connect to zsys daemon: timed out waiting for server handshake
# time grpcurl -proto zsys.proto -unix -v -plaintext -connect-timeout 1000 -H 'requesterid: 0' -H 'loglevel: 3' /run/zsysd.sock zsys.Zsys.Version

Resolved method descriptor:
rpc Version ( .zsys.Empty ) returns ( stream .zsys.VersionResponse );

Request metadata to send:
loglevel: 3
requesterid: 0

Response headers received:
content-type: application/grpc
requestid: 0:431b12a0

Response contents:
{
  "log": "."
}

Response contents:
{
  "log": "."
}

Response contents:
{
  "log": "."
}

Response contents:
{
  "version": "0.4.8"
}

Response trailers received:
(empty)
Sent 0 requests and received 4 responses

real    0m2.388s
user    0m0.029s
sys     0m0.007s
# zsysctl list
ID                        ZSys  Last Used
--                        ----  ---------
rpool/ROOT/ubuntu_j6h7lo  true  current

To Reproduce

Having non-trivial zfs volumes (e.g. via containerd) seems to help:

# zfs list|wc -l
1168

Expected behavior

zsysctl should work, even if slowly

For ubuntu users, please run and copy the following:

the log isn't trivially short, pasted in here

Screenshots
If applicable, add screenshots to help explain your problem.

Installed versions:

  • OS: Ubuntu 20.04.2 LTS
  • Zsysd running version: 0.4.8

Additional context
Add any other context about the problem here.

@taisph
Copy link

taisph commented Jul 10, 2021

I'm getting frequent timeouts too during especially during apt package changes.

I'm using ZFS for Docker which results in a large number of volumes as well.

# zfs list|wc -l
6118

@TheGrave
Copy link

TheGrave commented Jun 8, 2022

Same errors for me. On top of this zsys-gc fails constantly:

~$ sudo systemctl status zsys-gc
● zsys-gc.service - Clean up old snapshots to free space
Loaded: loaded (/lib/systemd/system/zsys-gc.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2022-06-08 11:03:55 CEST; 10h ago
TriggeredBy: ● zsys-gc.timer
Process: 1374531 ExecStart=/sbin/zsysctl service gc (code=exited, status=1/FAILURE)
Main PID: 1374531 (code=exited, status=1/FAILURE)

Jun 08 11:03:35 zfs-backup-host systemd[1]: Starting Clean up old snapshots to free space...
Jun 08 11:03:55 zfs-backup-host zsysctl[1374531]: level=error msg="couldn't connect to zsys daemon: timed out waiting for server h>
Jun 08 11:03:55 zfs-backup-host systemd[1]: zsys-gc.service: Main process exited, code=exited, status=1/FAILURE
Jun 08 11:03:55 zfs-backup-host systemd[1]: zsys-gc.service: Failed with result 'exit-code'.
Jun 08 11:03:55 zfs-backup-host systemd[1]: Failed to start Clean up old snapshots to free space.

I'm sure it's the same for you guys, you probably haven't noticed yet.

Got a weird feeling it might be related to a large amount of snaps I have:

$ zfs list -t snapshot | wc -l
12398

Most of these are not on rpool/bpool but an external drive so not sure if it's related. System ones are only:

$ zfs list -t snapshot | grep -v backup | wc -l
391

As far as I understand zsys shouldn't be messing with snaps of non-system-related datasets but maybe service crashes while waiting for some output?

@64knl
Copy link

64knl commented Oct 10, 2023

I have the same issue, also large number of snaps:

zfs list -t snapshot | wc -l
16863

@TheGrave
Copy link

The workaround I use is:

sudo ./zfs-prune-snapshots -R -v 1M

This wipes all snaps older than 1 month. Daemon works fine after this cleanup.

@Lockszmith-GH
Copy link

The workaround I use is:

sudo ./zfs-prune-snapshots -R -v 1M

This wipes all snaps older than 1 month. Daemon works fine after this cleanup.

Is this what you are using?

@TheGrave
Copy link

Yep

@ReSearchITEng
Copy link

Thanks @TheGrave for sharing zfs-prune-snapshots.
Personally I first delete using docker commands:

docker system prune -a -f --volumes

and afterwards using zfs commands.
clean zfs snapshots.md

@xnox
Copy link
Collaborator

xnox commented Mar 25, 2024

@awhitcroft please locate somebody to subscribe and respond to zsys things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants