complete docker lockup if ceph cli are stuck (for wathever reason) #32

nul0op · 2024-11-18T01:53:23Z

Hi,

i know a ceph cluster, and specifically the mons should always be up. but in some occasion they're not.
for example: when we completely shut off a 3 node ceph cluster.
and this cluster have for example autostart container (restart policy = always) AND ceph components are also docker containers...

this lead the whole things towards a complete failure as docker engine cannot start, because it keeps retrying connecting to rbd volumes, and wetopi/rbd, calling ceph command lines, and those being stuck ... (because ceph is still not up).

"Error while checking if volume 'xxxxxxx' exists in driver 'wetopi/rbd:latest' .. retrying .. and spending manu seconds
there ... and for every volumes ...

if wetopi reports to docker that it doesn't have the volume (because ceph is down), docker will i guess fails to start the container. and that's ok.

but having the wetopi calls stuck forever (again, because ceph cli utils are themselves stuck forever) os a nightmare. even a simple "docker ps" freeze ...

my questions: having a "fail gracefully", by having a timeout around the ceph tools would be fine. I see in the source code that the wrapper exist. i didn't looked deeply (because it's now late :-( ), i will do tomorrow.. or you have perhaps the reason ..

Thanks

nul0op · 2024-11-18T02:10:53Z

seems rbd-docker.go List() doesn't enforce any timeout before calling GetRbdImages() api.
will check the ceph api doc to see if there is something there.

nul0op changed the title ~~complete docker lookup if ceph cli are stuck (for wathever reason)~~ complete docker lockup if ceph cli are stuck (for wathever reason) Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

complete docker lockup if ceph cli are stuck (for wathever reason) #32

complete docker lockup if ceph cli are stuck (for wathever reason) #32

nul0op commented Nov 18, 2024

nul0op commented Nov 18, 2024

complete docker lockup if ceph cli are stuck (for wathever reason) #32

complete docker lockup if ceph cli are stuck (for wathever reason) #32

Comments

nul0op commented Nov 18, 2024

nul0op commented Nov 18, 2024