-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpuset: no space left on device
#23405
Comments
Hi @rodrigol-chan! Sorry to hear you're running into trouble. The error you're getting here is particularly weird:
We're writing to the
|
It happened again just now, on a different machine.
This Nomad client configuration now looks relevant: client {
gc_max_allocs = 300
gc_disk_usage_threshold = 80
} And currently have over 300 allocations:
Nomad seems to be keeping a lot of tmpfss around even if the allocations aren't running anymore. I'm not sure if that's by design.
|
For extra context: issue seems new with the 1.7.x upgrade. We've run this configuration in 1.6.x for about 8 months with no similar issues. |
Thanks for that extra info @rodrigol-chan. Even with that large number of allocs, I'd think you'd be ok until you get to 65535 inodes. I'll dig into that a little further to see if there's some more
The mounts are left in place until the allocation is GC'd on the client. We do that so that you can debug failed allocations. |
The issue still happens as of 1.8.3. Is there anything we can do to help troubleshoot this? |
Hi @rodrigol-chan, sorry, I haven't been able to circle back to this and I'm currently swamped trying to land some work for our 1.9 beta next week. I suspect this is platform-specific. I think you'll want to look into whether there's anything in the host configuration that could be limiting the size of those virtual FS directories. |
Hi @rodrigol-chan! Just wanted to check in so you don't think I've forgotten this issue. I re-read through your initial report to see if there were any clues I missed.
Even ignoring the errors you're seeing, that's got to be a bug all by itself. These should never overlap. Even though we can't write to the two files atomically, we always remove from the source first and then write to the destination. So in that tiny race you should see a missing CPU but not one counted twice. So I'll look into seeing if I can find any place where there's potentially another race condition here where that's not correctly handled.
You have other allocations on the same host that do use core constraints though? If not, we're writing an empty value to the cgroup. In which case, I found this Stack Exchange post which describes that scenario, but has no answer. 🤦 I managed to dig up a few old issues that suggest that if Also, I wanted to see if I could get this error outside of Nomad by echoing a bad input to the cgroup file, and wasn't able to get that same error.
I did get some interesting (but different) errors trying to write to the
One more thing I'd like you to try is the following, to make sure we've counted the cgroups correctly when trying to figure out if its the inodes issue:
|
That's correct.
Just happened again: # find /sys/fs/cgroup -depth -type d | wc -l
81
# find /sys/fs/cgroup/nomad.slice -depth -type d | wc -l
29
# head /sys/fs/cgroup/nomad.slice/cpuset.cpus /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus /sys/fs/cgroup/nomad.slice/share.slice/cpuset.cpus
==> /sys/fs/cgroup/nomad.slice/cpuset.cpus <==
0-31
==> /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus <==
0-3
==> /sys/fs/cgroup/nomad.slice/share.slice/cpuset.cpus <==
4-31 Log output:
It doesn't look like the CPUs overlapped this time. The number of dying descendants is curious, I wonder if it's related: # head /sys/fs/cgroup/nomad.slice/cgroup.stat /sys/fs/cgroup/nomad.slice/reserve.slice/cgroup.stat /sys/fs/cgroup/nomad.slice/share.slice/cgroup.stat
==> /sys/fs/cgroup/nomad.slice/cgroup.stat <==
nr_descendants 28
nr_dying_descendants 2356
==> /sys/fs/cgroup/nomad.slice/reserve.slice/cgroup.stat <==
nr_descendants 1
nr_dying_descendants 78
==> /sys/fs/cgroup/nomad.slice/share.slice/cgroup.stat <==
nr_descendants 25
nr_dying_descendants 2278 |
Can you confirm whether the
|
I did look at that at failure time and from memory it was at
I can't find any
I'll doublecheck |
Just happened again. (It has been happening strangely often lately.) Here are the values requested. $ head /sys/fs/cgroup/nomad.slice/reserve.slice/cgroup.max.descendants
max
$ head /sys/fs/cgroup/nomad.slice/reserve.slice/cgroup.stat
nr_descendants 1
nr_dying_descendants 11
$ head /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus
0-3
$ head /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.mems
$ This doesn't look like it should be possible, though:
It might just be an artifact of how the data is collected since I don't think it's possible to do an atomic snapshot of cgroups. All Nomad cgroups
|
Nomad version
Operating system and Environment details
Running Ubuntu 22.04 on Google Cloud in an n2d-standard-32 instance.
Issue
Alerts fired due to failed allocations. Upon investigation, I noticed the following log line:
Also interesting to observe is that, unlike in our other 1.7.x clients, there's overlap between the CPUs for the reserve and share slices:
Reproduction steps
Not clear how to reproduce. This happened on a single instance. All allocations that failed are from periodic jobs, running with on the
exec
driver with nocore
constraints.Expected Result
Allocations spawn successfully.
Actual Result
Allocations failed to spawn.
Nomad Client logs (if appropriate)
Nomad client configuration
The text was updated successfully, but these errors were encountered: