Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"failed to find snapshot readonly" error occurs when creating an instance. #14728

Open
3 of 6 tasks
tam1192 opened this issue Jan 4, 2025 · 1 comment
Open
3 of 6 tasks

Comments

@tam1192
Copy link

tam1192 commented Jan 4, 2025

  • Distribution: Ubuntu 24.04.1 LTS

snap list --all lxd core20 core22 core24 snapd microceph output:

Name       Version                Rev    Tracking       Publisher   Notes
core20     20240911               2434   latest/stable  canonical✓  base
core22     20241001               1663   latest/stable  canonical✓  base,disabled
core22     20241119               1722   latest/stable  canonical✓  base
lxd        5.21.2-2f4ba6b         30131  5.21/stable    canonical✓  held
microceph  18.2.4+snapc9f2b08f92  1139   reef/stable    canonical✓  held
snapd      2.63                   21759  latest/stable  canonical✓  snapd,disabled
snapd      2.66.1                 23258  latest/stable  canonical✓  snapd

Issue description

My environment is an lxd cluster configuration combined with microceph.
I launched an instance for the first time in a month and got an error that I thought was ceph-related.

lxc launch ubuntu:24.10 --vm
Creating the instance
Error: Failed instance creation: Failed creating instance from image: Failed to run: rbd --id admin --cluster ceph --image-feature layering clone lxd/image_86a133f5a92a26b8c6fe9fc0f0df2cc8bc51250ffec8ed282f54c78f0f7c220b_ext4@readonly lxd/virtual-machine_k8s-test_vast-flounder: exit status 2 (2025-01-04T01:32:37.284+0000 7e4ccbe00640 -1 librbd::image::OpenRequest: failed to find snapshot readonly
2025-01-04T01:32:37.284+0000 7e4cbe000640 -1 librbd::image::CloneRequest: 0x5e2976f11b80 handle_open_parent: failed to open parent image: (2) No such file or directory
rbd: clone error: (2) No such file or directory)

When I look at https://discuss.linuxcontainers.org/t/lxd-3-21-more-ceph-issues/6868 I see the same phenomenon, but it seems to say that it has already been resolved.

The method of reproduction has not been verified.
But my sense is that it is happening after deleting the instance.

Information to attach

  • Any relevant kernel output (dmesg)
    I ruled out what was clearly another error.
[1265834.553858] tg3 0000:02:00.0 eno1: Link is down
[1265837.357978] tg3 0000:02:00.0 eno1: Link is up at 1000 Mbps, full duplex
[1265837.358007] tg3 0000:02:00.0 eno1: Flow control is off for TX and off for RX
[1265837.358013] tg3 0000:02:00.0 eno1: EEE is disabled
[1269295.095615] tg3 0000:02:00.0 eno1: Link is down
[1269297.936574] tg3 0000:02:00.0 eno1: Link is up at 1000 Mbps, full duplex
[1269297.936582] tg3 0000:02:00.0 eno1: Flow control is off for TX and off for RX
[1269297.936585] tg3 0000:02:00.0 eno1: EEE is disabled
[1275085.985000] audit: type=1400 audit(1735948806.652:4629): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-tailscale-service_tailscale-node-adw1_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/loadavg" pid=503925 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275085.985010] audit: type=1400 audit(1735948806.652:4630): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-tailscale-service_tailscale-node-adw1_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/stat" pid=503925 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275085.985014] audit: type=1400 audit(1735948806.652:4631): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-tailscale-service_tailscale-node-adw1_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/swaps" pid=503925 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275085.985031] audit: type=1400 audit(1735948806.652:4632): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-tailscale-service_tailscale-node-adw1_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/uptime" pid=503925 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275085.985051] audit: type=1400 audit(1735948806.652:4633): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-tailscale-service_tailscale-node-adw1_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/cpuinfo" pid=503925 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275085.985069] audit: type=1400 audit(1735948806.652:4634): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-tailscale-service_tailscale-node-adw1_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/slabinfo" pid=503925 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275085.985087] audit: type=1400 audit(1735948806.652:4635): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-tailscale-service_tailscale-node-adw1_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/" pid=503925 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275085.985104] audit: type=1400 audit(1735948806.652:4636): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-tailscale-service_tailscale-node-adw1_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/diskstats" pid=503925 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275085.985123] audit: type=1400 audit(1735948806.652:4637): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-tailscale-service_tailscale-node-adw1_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/meminfo" pid=503925 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275128.464310] audit: type=1400 audit(1735948849.130:4638): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/" pid=505085 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275128.464320] audit: type=1400 audit(1735948849.130:4639): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/meminfo" pid=505085 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275128.464323] audit: type=1400 audit(1735948849.130:4640): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/slabinfo" pid=505085 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275128.464335] audit: type=1400 audit(1735948849.130:4641): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/loadavg" pid=505085 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275128.464356] audit: type=1400 audit(1735948849.130:4642): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/stat" pid=505085 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275128.464375] audit: type=1400 audit(1735948849.130:4643): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/swaps" pid=505085 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275128.464394] audit: type=1400 audit(1735948849.130:4644): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/uptime" pid=505085 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275128.464413] audit: type=1400 audit(1735948849.130:4645): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/diskstats" pid=505085 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275128.464432] audit: type=1400 audit(1735948849.130:4646): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/cpuinfo" pid=505085 comm="(ogrotate)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1275197.008917] audit: type=1400 audit(1735948917.673:4647): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=506927 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[1275197.008930] audit: type=1400 audit(1735948917.673:4648): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=506927 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[1275197.314183] audit: type=1400 audit(1735948917.979:4649): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=507000 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[1275197.314224] audit: type=1400 audit(1735948917.979:4650): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=507000 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[1275198.790060] audit: type=1400 audit(1735948919.454:4651): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=507014 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[1275198.790272] audit: type=1400 audit(1735948919.455:4652): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=507014 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[1275198.837429] audit: type=1400 audit(1735948919.502:4653): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=507016 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[1275198.837457] audit: type=1400 audit(1735948919.502:4654): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=507016 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[1275926.396930] audit: type=1400 audit(1735949647.051:4655): apparmor="DENIED" operation="capable" class="cap" profile="snap.microovn.ovs-vsctl" pid=526739 comm="ovs-vsctl" capability=2  capname="dac_read_search"
[1275926.467581] audit: type=1400 audit(1735949647.122:4656): apparmor="DENIED" operation="capable" class="cap" profile="snap.microovn.ovs-vsctl" pid=526780 comm="ovs-vsctl" capability=2  capname="dac_read_search"
[1276774.009563] audit: type=1400 audit(1735950494.652:4657): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/meminfo" pid=549605 comm="(mandb)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1276774.009577] audit: type=1400 audit(1735950494.652:4658): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/uptime" pid=549605 comm="(mandb)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1276774.009583] audit: type=1400 audit(1735950494.652:4659): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/loadavg" pid=549605 comm="(mandb)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1276774.009589] audit: type=1400 audit(1735950494.652:4660): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/" pid=549605 comm="(mandb)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1276774.009611] audit: type=1400 audit(1735950494.652:4661): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/slabinfo" pid=549605 comm="(mandb)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1276774.009676] audit: type=1400 audit(1735950494.652:4662): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/stat" pid=549605 comm="(mandb)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1276774.009696] audit: type=1400 audit(1735950494.652:4663): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/swaps" pid=549605 comm="(mandb)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1276774.009715] audit: type=1400 audit(1735950494.652:4664): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/cpuinfo" pid=549605 comm="(mandb)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1276774.009734] audit: type=1400 audit(1735950494.652:4665): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxd-global-service_ins4-rtmc_</var/snap/lxd/common/lxd>" name="/run/systemd/mount-rootfs/proc/diskstats" pid=549605 comm="(mandb)" flags="rw, nosuid, nodev, noexec, remount, bind"
[1277959.705680] tg3 0000:02:00.0 eno1: Link is down
[1277962.605689] tg3 0000:02:00.0 eno1: Link is up at 1000 Mbps, full duplex
[1277962.605717] tg3 0000:02:00.0 eno1: Flow control is off for TX and off for RX
[1277962.605726] tg3 0000:02:00.0 eno1: EEE is disabled
[1279153.087559] tap066ff7a7: left promiscuous mode
[1279153.261976] EXT4-fs (rbd1): unmounting filesystem 263f7ee9-8c36-4043-bee5-59569888c705.
[1279154.644528] audit: type=1400 audit(1735952875.254:4666): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-k8s-test_handy-ringtail_</var/snap/lxd/common/lxd>" pid=613316 comm="apparmor_parser"
[1279177.994674] tap0c888e3e: left promiscuous mode
[1279178.044022] EXT4-fs (rbd3): unmounting filesystem 263f7ee9-8c36-4043-bee5-59569888c705.
[1279179.358504] audit: type=1400 audit(1735952899.968:4667): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-k8s-test_charmed-mink_</var/snap/lxd/common/lxd>" pid=613961 comm="apparmor_parser"
[1279286.531260]  rbd0: p1 p13 p14 p15
[1279286.531711] rbd: rbd0: capacity 10737418240 features 0x1
[1279418.696431]  rbd0: p1 p13 p14 p15
[1279418.696728] rbd: rbd0: capacity 10737418240 features 0x1
[1279434.680963]  rbd0: p1 p13 p14 p15
[1279434.681313] rbd: rbd0: capacity 10737418240 features 0x1
[1280027.257671] audit: type=1400 audit(1735953747.855:4668): apparmor="DENIED" operation="capable" class="cap" profile="snap.microovn.ovs-vsctl" pid=637913 comm="ovs-vsctl" capability=2  capname="dac_read_search"
[1280027.332349] audit: type=1400 audit(1735953747.929:4669): apparmor="DENIED" operation="capable" class="cap" profile="snap.microovn.ovs-vsctl" pid=637948 comm="ovs-vsctl" capability=2  capname="dac_read_search"
[1280035.191909] audit: type=1400 audit(1735953755.790:4670): apparmor="DENIED" operation="capable" class="cap" profile="snap.microovn.ovs-vsctl" pid=638138 comm="ovs-vsctl" capability=2  capname="dac_read_search"
[1280038.523876] audit: type=1400 audit(1735953759.122:4671): apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.rbd" pid=638324 comm="rbd" capability=2  capname="dac_read_search"
[1280044.358919] audit: type=1400 audit(1735953764.957:4672): apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.rbd" pid=638512 comm="rbd" capability=2  capname="dac_read_search"
[1280634.119945] audit: type=1400 audit(1735954354.708:4673): apparmor="DENIED" operation="capable" class="cap" profile="snap.lxd.lxc" pid=654288 comm="lxc" capability=2  capname="dac_read_search"
[1280635.729631]  rbd0: p1 p13 p14 p15
[1280635.729988] rbd: rbd0: capacity 10737418240 features 0x1
  • Container log (lxc info NAME --show-log)
  • Container configuration (lxc config show NAME --expanded)
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
time="2024-12-21T10:03:41Z" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priority will be ignored. Please use per-device limits.priority instead"
time="2024-12-21T10:03:41Z" level=warning msg="Dqlite: attempt 1: server 10.194.20.50:8443: no known leader"
time="2024-12-21T10:03:42Z" level=error msg="Failed initializing network" err="Failed validating: Uplink network doesn't contain \"10.194.200.0/24\" in its routes" network=default project=try-ovn-peer1
time="2024-12-21T10:04:44Z" level=error msg="Failed initializing network" err="Failed validating: Uplink network doesn't contain \"10.194.200.0/24\" in its routes" network=default project=try-ovn-peer1
time="2024-12-21T11:38:38Z" level=error msg="Error getting disk usage" err="Failed to run: rbd du --format json --id admin --cluster ceph --pool lxd container_user01-nikki_test4: exit status 2 (specified image container_user01-nikki_test4 is not found.\nrbd: du failed: (2) No such file or directory)" instance=test4 instanceType=container project=user01-nikki
time="2024-12-21T11:40:51Z" level=warning msg="Error getting disk usage" err="Failed to run: rbd du --format json --id admin --cluster ceph --pool lxd virtual-machine_devcon_ex-instance-0.block: exit status 2 (specified image virtual-machine_devcon_ex-instance-0.block is not found.\nrbd: du failed: (2) No such file or directory)" instance=ex-instance-0 instanceType=virtual-machine project=devcon
time="2024-12-21T11:41:04Z" level=warning msg="Error getting disk usage" err="Failed to run: rbd du --format json --id admin --cluster ceph --pool lxd virtual-machine_devcon_ex-instance-0.block: exit status 2 (specified image virtual-machine_devcon_ex-instance-0.block is not found.\nrbd: du failed: (2) No such file or directory)" instance=ex-instance-0 instanceType=virtual-machine project=devcon
time="2024-12-21T11:57:43Z" level=warning msg="Excluding offline member from operations list" ID=3 address="10.194.20.52:8443" lastHeartbeat="2024-12-21 11:57:21.035717815 +0000 UTC" member=adw-3
time="2024-12-21T11:58:09Z" level=warning msg="Excluding offline member from operations list" ID=3 address="10.194.20.52:8443" lastHeartbeat="2024-12-21 11:57:49.090946915 +0000 UTC" member=adw-3
time="2024-12-21T12:10:10Z" level=warning msg="Failed adding member event listener client" err="dial tcp 10.194.20.52:8443: connect: connection refused" local="10.194.20.50:8443" remote="10.194.20.52:8443"
time="2024-12-21T12:10:45Z" level=warning msg="Excluding offline member from operations list" ID=3 address="10.194.20.52:8443" lastHeartbeat="2024-12-21 12:09:50.961268269 +0000 UTC" member=adw-3
time="2024-12-21T12:10:50Z" level=warning msg="Excluding offline member from operations list" ID=3 address="10.194.20.52:8443" lastHeartbeat="2024-12-21 12:09:50.961268269 +0000 UTC" member=adw-3
time="2024-12-22T03:12:36Z" level=warning msg="Rejecting request from untrusted client" ip="10.194.20.200:50121"
time="2024-12-22T14:37:50Z" level=warning msg="Rejecting request from untrusted client" ip="10.194.20.200:51357"
time="2024-12-23T08:18:06Z" level=warning msg="Rejecting request from untrusted client" ip="10.194.20.200:51579"
time="2024-12-25T02:08:44Z" level=warning msg="Rejecting request from untrusted client" ip="10.194.20.200:53050"
time="2024-12-25T05:02:20Z" level=warning msg="Error getting disk usage" err="Failed to run: rbd du --format json --id admin --cluster ceph --pool lxd virtual-machine_k8s-test_inviting-reindeer.block: exit status 2 (rbd: du failed: (2) No such file or directory)" instance=inviting-reindeer instanceType=virtual-machine project=k8s-test
time="2024-12-25T05:10:20Z" level=warning msg="Failed getting exec control websocket reader, killing command" PID=0 err="websocket: close 1005 (no status)" instance=handy-ringtail interactive=true project=k8s-test
time="2024-12-25T05:11:04Z" level=warning msg="Failed getting exec control websocket reader, killing command" PID=0 err="websocket: close 1005 (no status)" instance=handy-ringtail interactive=true project=k8s-test
time="2024-12-25T06:15:51Z" level=warning msg="Failed getting exec control websocket reader, killing command" PID=0 err="websocket: close 1006 (abnormal closure): unexpected EOF" instance=handy-ringtail interactive=true project=k8s-test
time="2024-12-25T14:10:04Z" level=warning msg="Failed getting exec control websocket reader, killing command" PID=0 err="websocket: close 1006 (abnormal closure): unexpected EOF" instance=handy-ringtail interactive=true project=k8s-test
time="2024-12-28T22:27:01Z" level=warning msg="Transaction timed out. Retrying once" err="Failed to begin transaction: context deadline exceeded" member=1
time="2024-12-28T22:27:01Z" level=warning msg="Transaction timed out. Retrying once" err="Failed to begin transaction: context deadline exceeded" member=1
time="2024-12-31T13:14:34Z" level=warning msg="Dqlite proxy failed" err="first: remote -> local: read tcp 10.194.20.50:8443->10.194.20.52:41144: read: connection timed out" local="10.194.20.50:8443" name=dqlite remote="10.194.20.52:41144"
time="2024-12-31T13:14:34Z" level=warning msg="Dqlite proxy failed" err="first: remote -> local: read tcp 10.194.20.50:38380->10.194.20.52:8443: read: connection timed out" local="10.194.20.50:38380" name=raft remote="10.194.20.52:8443"
time="2025-01-02T02:44:41Z" level=warning msg="Failed getting exec control websocket reader, killing command" PID=0 err="read tcp 10.194.20.50:8443->10.194.20.200:63839: read: connection timed out" instance=handy-ringtail interactive=true project=k8s-test
  • Output of the client with --debug
Creating the instance
DEBUG  [2025-01-04T01:47:38Z] Connecting to a remote simplestreams server   URL="https://cloud-images.ubuntu.com/releases"
DEBUG  [2025-01-04T01:47:38Z] Connected to the websocket: ws://unix.socket/1.0/events?project=k8s-test 
DEBUG  [2025-01-04T01:47:38Z] Sending request to LXD                        etag= method=POST url="http://unix.socket/1.0/instances?project=k8s-test"
DEBUG  [2025-01-04T01:47:39Z] Got operation from LXD                       
DEBUG  [2025-01-04T01:47:39Z] 
   {
   	"id": "ec236092-86d7-462b-88e3-97a965ea141b",
   	"class": "task",
   	"description": "Creating instance",
   	"created_at": "2025-01-04T01:47:39.669244236Z",
   	"updated_at": "2025-01-04T01:47:39.669244236Z",
   	"status": "Running",
   	"status_code": 103,
   	"resources": {
   		"instances": [
   			"/1.0/instances/grand-shad?project=k8s-test"
   		]
   	},
   	"metadata": null,
   	"may_cancel": false,
   	"err": "",
   	"location": "adw-1"
   } 
DEBUG  [2025-01-04T01:47:39Z] Sending request to LXD                        etag= method=GET url="http://unix.socket/1.0/operations/ec236092-86d7-462b-88e3-97a965ea141b?project=k8s-test"
DEBUG  [2025-01-04T01:47:39Z] Got response struct from LXD                 
DEBUG  [2025-01-04T01:47:39Z] 
   {
   	"id": "ec236092-86d7-462b-88e3-97a965ea141b",
   	"class": "task",
   	"description": "Creating instance",
   	"created_at": "2025-01-04T01:47:39.669244236Z",
   	"updated_at": "2025-01-04T01:47:39.669244236Z",
   	"status": "Running",
   	"status_code": 103,
   	"resources": {
   		"instances": [
   			"/1.0/instances/grand-shad?project=k8s-test"
   		]
   	},
   	"metadata": null,
   	"may_cancel": false,
   	"err": "",
   	"location": "adw-1"
   } 
Error: Failed instance creation: Failed creating instance from image: Failed to run: rbd --id admin --cluster ceph --image-feature layering clone lxd/image_86a133f5a92a26b8c6fe9fc0f0df2cc8bc51250ffec8ed282f54c78f0f7c220b_ext4@readonly lxd/virtual-machine_k8s-test_grand-shad: exit status 2 (2025-01-04T01:47:41.476+0000 7d7ba8a00640 -1 librbd::image::OpenRequest: failed to find snapshot readonly
2025-01-04T01:47:41.476+0000 7d7b96a00640 -1 librbd::image::CloneRequest: 0x5b59f0d54f60 handle_open_parent: failed to open parent image: (2) No such file or directory
rbd: clone error: (2) No such file or directory)
  • Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)
@tam1192
Copy link
Author

tam1192 commented Jan 4, 2025

Allow me to add a little something.

As a tentative solution, we're trying the following

  1. rbd snap ls lxd/image_86a133f5a92a26b8c6fe9fc0f0df2cc8bc51250ffec8ed282f54c78f0f7c220b_ext4
SNAPID  NAME                                                  SIZE     PROTECTED  TIMESTAMP               
    76  zombie_snapshot_4f30d075-b455-45a6-bd8f-293c5ae935d0  100 MiB  yes        Sun Dec 22 08:27:07 2024
  1. rbd children lxd/image_86a133f5a92a26b8c6fe9fc0f0df2cc8bc51250ffec8ed282f54c78f0f7c220b_ext4@zombie_snapshot_4f30d075-b455-45a6-bd8f-293c5ae935d0
lxd/virtual-machine_k8s-test2_measured-wahoo
  1. lxc delete measured-wahoo
Error: Failed deleting instance "measured-wahoo" in project "k8s-test2": Error deleting storage volume: Failed to delete volume: Failed to run: rbd --id admin --cluster ceph --pool lxd children --image image_86a133f5a92a26b8c6fe9fc0f0df2cc8bc51250ffec8ed282f54c78f0f7c220b.block --snap zombie_snapshot_cdfec6d0-8c1d-470e-863e-46587f55d897: exit status 2 (rbd: error opening image image_86a133f5a92a26b8c6fe9fc0f0df2cc8bc51250ffec8ed282f54c78f0f7c220b.block: (2) No such file or directory)

I tried twice and the instance disappears without error.
4. rbd snap unprotect lxd/image_86a133f5a92a26b8c6fe9fc0f0df2cc8bc51250ffec8ed282f54c78f0f7c220b_ext4@zombie_snapshot_4f30d075-b455-45a6-bd8f-293c5ae935d0

2025-01-04T01:57:42.255+0000 780b05000640 -1 librbd::SnapshotUnprotectRequest: cannot unprotect: at least 1 child(ren) [5953f5127e353] in pool 'lxd'
2025-01-04T01:57:42.257+0000 780b05a00640 -1 librbd::SnapshotUnprotectRequest: encountered error: (16) Device or resource busy
2025-01-04T01:57:42.257+0000 780b05a00640 -1 librbd::SnapshotUnprotectRequest: 0x5eb87b710790 should_complete_error: ret_val=-16
2025-01-04T01:57:42.265+0000 780b05000640 -1 librbd::SnapshotUnprotectRequest: 0x5eb87b710790 should_complete_error: ret_val=-16
rbd: unprotecting snap failed: (16) Device or resource busy

That's all I've touched on this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant