Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldconfig-free deployment #234

Open
SomeoneSerge opened this issue Jan 7, 2024 · 0 comments
Open

ldconfig-free deployment #234

SomeoneSerge opened this issue Jan 7, 2024 · 0 comments

Comments

@SomeoneSerge
Copy link

SomeoneSerge commented Jan 7, 2024

Hi! Libnvidia-container currently relies on glibc internals when locating the host system's libraries which limits its compatibility with the wider range of e.g. linux distributions. Nvidia-container-toolkit appears to provide a limited support for static configuration, e.g. the ModeCSV used for jetsons: https://github.com/NVIDIA/nvidia-container-toolkit/blob/a2262d00cc6d98ac2e95ae2f439e699a7d64dc17/pkg/nvcdi/lib.go#L98-L102, but many tools (e.g. apptainer and singularityCE) rely on libnvidia-docker directly. I think it's desirable that libnvidia-container (also) support static configuration, whereby the user would specify a list of search paths to look for the userspace driver libraries in, at build time or at runtime.

Motivation

From a glance, the stumbling stones seem to be as follows:

  • ldconfig is assumed to be aware of the userspace drivers' location (e.g. through a global /etc/ld.so.conf, which also may not exist);
  • /etc/ld.so.cache is assumed to exist, but it's not guaranteed to; ld.so.cache is specific to glibc, e.g. I'm not sure if such a concept exists for musl; while it's reasonable to limit the support to glibc (e.g. because NVidia only publishes the binaries built against that), even the systems that use glibc may not populate the global cache; it's safer to assume that it's an optional cache for speeding up the dynamic loader
  • A specific format of ld.so.cache is assumed, which is a glibc internal and probably not part of its public interface; e.g. ldcache.c replicates the header structure:
    struct header_libc6 {
    char magic[MAGIC_LIBC6_LEN];
    char version[MAGIC_VERSION_LEN];
    uint32_t nlibs;
    uint32_t table_size;
    uint32_t unused[5];
    struct entry_libc6 libs[];
    };

Inspecting the dynamic loader's search paths and inferring the host system's libraries seems to be a valid need, and we probably should consult with glibc (and/or other libc implementations') maintainers as to how to approach it correctly. The optional /etc/ld.so.conf is only one of the tunables that affects the ld.so's behaviour, the others being e.g. LD_PRELOAD, LD_LIBRARY_PATH, DT_RUNPATH. Rather than try and approximate just a part of the dynamic loader's behaviour we should probably use the loader itself. The only "public" interfaces I'm currently aware of are dlopen()+dlinfo() (allows code execution, albeit with the same privileges the parent process already has anyway) and ld.so --list (requires a test elf binary as an argument). I think a ticket in glibc's issue tracker would be a reasonable step forward.

Cf. also apptainer/apptainer#1894, NixOS/nixpkgs#279235, NVIDIA/nvidia-container-toolkit#71

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant