-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroups v1: failed to set the cpuset cgroup for container: write /sys/fs/cgroup/cpuset/nomad/shared/cgroup.procs: invalid argument #19418
Comments
Hi @juananinca! The "invalid argument" error suggests that we're trying to write to a now-invalid PID. While restarting, there's the existing task that's been stopped, and the new task being started (both in the same allocation). Can you share the task events from Also, can you provide the client logs during the fingerprint process of startup? Specifically looking for the logs around And can you verify which cgroups version you've got here? |
Hi @tgross 😄 I'm running into the same error on one of our clusters, with a specific Docker container and I'm having a hard time narrowing down the cause of the problem. Fingerprint on startup:
Logs:
Alloc events from one of the tests:
Not sure why it is the case but if I wrap the container's command (yarn in this case) into a script and call that, I'm unable to reproduce the issue. It looks like a timing issue but it's strange as, to my understanding, the PID that is added to cgroup.procs should only be known by Nomad after its creation...
|
Thanks @the-nando. And you're running on a 1.5.x version of Nomad as well? The cgroups code got a lot of reworking as part of 1.7, so I want to make sure we're chasing the same bug here. If so, can you reproduce the problem on 1.7.x?
What's especially strange about that is that your script is still running because it doesn't run |
Sorry, I forgot to mention that this cluster is on 1.6.2-ent. I'm afraid I won't be able to test with 1.7 on the short term as I don't have a cluster at hand already upgraded with the same setup. |
Doing a bit of issue triage cleanup. I'm going to move this onto our internal roadmapping board for follow-up, but:
If we get updated info that this is reproduced on 1.7.x+, we'll be sure to re-prioritize. |
Nomad version
Nomad v1.5.6
Operating system and Environment details
OracleLinux 8.8
Issue
One specific job is showing the following error sometimes when nomad tries to restart a container:
failed to set the cpuset cgroup for container: write /sys/fs/cgroup/cpuset/nomad/shared/cgroup.procs: invalid argument
Reproduction steps
Cannot provide reproduction steps since not always the error comes out, but here you have the nomad config as well as the nomad job definition:
Nomad config:
Expected Result
The job is able to restart successfully.
Actual Result
The job shows
failed to set the cpuset cgroup for container: write /sys/fs/cgroup/cpuset/nomad/shared/cgroup.procs: invalid argument
when nomad restarts it.Job file (if appropriate)
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
There is an open issue related where a user reported something similar (#17890 (comment)) but not sure if it is the same case.
The text was updated successfully, but these errors were encountered: