You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to use GPUs with gVisor in rootless mode - for now, the container just runs nvidia-smi- and I am running into issues originating from nvidia-container-cli configure. Various sources allude to using GPUs in rootless mode being possible in Docker and Podman, though I've also found issues like #49 where it hasn't worked with runc.
The same error was observed in #104 and resolved by making perm_drop_privileges return 0. I believe a simpler route is to add the "--user=root:root" flag when invoking nvidia-container-cli. I tried both, and in both cases the error then became this for me:
[pid 351446] setns(3, CLONE_NEWNS) = -1 EPERM (Operation not permitted)
To resolve this, I added -m to unshare, ending up with an error in one of the last steps of configure:
I'm attempting to use GPUs with gVisor in rootless mode - for now, the container just runs
nvidia-smi
- and I am running into issues originating fromnvidia-container-cli configure
. Various sources allude to using GPUs in rootless mode being possible in Docker and Podman, though I've also found issues like #49 where it hasn't worked withrunc
.nvidia-container-cli configure
is invoked in gVisor at https://github.com/google/gvisor/blob/54359c5b5fbb354f52866e0ff745b09543af2fc9/runsc/container/container.go#L2023-L2031; you can see that--no-cgroups
is already passed.To get some more information for debugging, I built gVIsor myself and prefixed the
nvidia-container-cli configure
invocation with/usr/bin/strace -f
.I started with this command to run a container.
The same error was observed in #104 and resolved by making
perm_drop_privileges
return 0. I believe a simpler route is to add the"--user=root:root"
flag when invokingnvidia-container-cli
. I tried both, and in both cases the error then became this for me:To resolve this, I added
-m
tounshare
, ending up with an error in one of the last steps ofconfigure
:The code flow to this mount is as follows:
ldcache_update
inconfigure
change_rootfs
innvc_ldcache_update
xmount(err, NULL, "/proc", "proc", MS_RDONLY, NULL)
Unfortunately, I'm stuck here and out of ideas. Is there anything else I can try to make this work? Were any of my previous steps incorrect?
Additional Info
For specifics about how I set up the container, please see google/gvisor#11069.
The text was updated successfully, but these errors were encountered: