You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Libnvidia-container currently relies on glibc internals when locating the host system's libraries which limits its compatibility with the wider range of e.g. linux distributions. Nvidia-container-toolkit appears to provide a limited support for static configuration, e.g. the ModeCSV used for jetsons: https://github.com/NVIDIA/nvidia-container-toolkit/blob/a2262d00cc6d98ac2e95ae2f439e699a7d64dc17/pkg/nvcdi/lib.go#L98-L102, but many tools (e.g. apptainer and singularityCE) rely on libnvidia-docker directly. I think it's desirable that libnvidia-container (also) support static configuration, whereby the user would specify a list of search paths to look for the userspace driver libraries in, at build time or at runtime.
Motivation
From a glance, the stumbling stones seem to be as follows:
ldconfig is assumed to be aware of the userspace drivers' location (e.g. through a global /etc/ld.so.conf, which also may not exist);
/etc/ld.so.cache is assumed to exist, but it's not guaranteed to; ld.so.cache is specific to glibc, e.g. I'm not sure if such a concept exists for musl; while it's reasonable to limit the support to glibc (e.g. because NVidia only publishes the binaries built against that), even the systems that use glibc may not populate the global cache; it's safer to assume that it's an optional cache for speeding up the dynamic loader
A specific format of ld.so.cache is assumed, which is a glibc internal and probably not part of its public interface; e.g. ldcache.c replicates the header structure:
Inspecting the dynamic loader's search paths and inferring the host system's libraries seems to be a valid need, and we probably should consult with glibc (and/or other libc implementations') maintainers as to how to approach it correctly. The optional /etc/ld.so.conf is only one of the tunables that affects the ld.so's behaviour, the others being e.g. LD_PRELOAD, LD_LIBRARY_PATH, DT_RUNPATH. Rather than try and approximate just a part of the dynamic loader's behaviour we should probably use the loader itself. The only "public" interfaces I'm currently aware of are dlopen()+dlinfo() (allows code execution, albeit with the same privileges the parent process already has anyway) and ld.so --list (requires a test elf binary as an argument). I think a ticket in glibc's issue tracker would be a reasonable step forward.
Hi! Libnvidia-container currently relies on glibc internals when locating the host system's libraries which limits its compatibility with the wider range of e.g. linux distributions. Nvidia-container-toolkit appears to provide a limited support for static configuration, e.g. the
ModeCSV
used for jetsons: https://github.com/NVIDIA/nvidia-container-toolkit/blob/a2262d00cc6d98ac2e95ae2f439e699a7d64dc17/pkg/nvcdi/lib.go#L98-L102, but many tools (e.g. apptainer and singularityCE) rely on libnvidia-docker directly. I think it's desirable that libnvidia-container (also) support static configuration, whereby the user would specify a list of search paths to look for the userspace driver libraries in, at build time or at runtime.Motivation
From a glance, the stumbling stones seem to be as follows:
ldconfig
is assumed to be aware of the userspace drivers' location (e.g. through a global/etc/ld.so.conf
, which also may not exist);/etc/ld.so.cache
is assumed to exist, but it's not guaranteed to;ld.so.cache
is specific to glibc, e.g. I'm not sure if such a concept exists formusl
; while it's reasonable to limit the support to glibc (e.g. because NVidia only publishes the binaries built against that), even the systems that useglibc
may not populate the global cache; it's safer to assume that it's an optional cache for speeding up the dynamic loaderld.so.cache
is assumed, which is a glibc internal and probably not part of its public interface; e.g.ldcache.c
replicates the header structure:libnvidia-container/src/ldcache.c
Lines 46 to 53 in 5c75904
Inspecting the dynamic loader's search paths and inferring the host system's libraries seems to be a valid need, and we probably should consult with glibc (and/or other libc implementations') maintainers as to how to approach it correctly. The optional
/etc/ld.so.conf
is only one of the tunables that affects theld.so
's behaviour, the others being e.g.LD_PRELOAD
,LD_LIBRARY_PATH
,DT_RUNPATH
. Rather than try and approximate just a part of the dynamic loader's behaviour we should probably use the loader itself. The only "public" interfaces I'm currently aware of aredlopen()
+dlinfo()
(allows code execution, albeit with the same privileges the parent process already has anyway) andld.so --list
(requires a test elf binary as an argument). I think a ticket in glibc's issue tracker would be a reasonable step forward.Cf. also apptainer/apptainer#1894, NixOS/nixpkgs#279235, NVIDIA/nvidia-container-toolkit#71
Thanks!
The text was updated successfully, but these errors were encountered: