Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On COS, add NVIDIA library directory to LD configuration and update cache. #11359

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented Jan 13, 2025

On COS, add NVIDIA library directory to LD configuration and update cache.

Unlike Ubuntu VMs where we use Docker's --gpus flag, COS VMs do not use
this flag and instead mount the NVIDIA library directories automatically.
However, nothing guarantees that these directories are added to the LD
config. This change fixes that. It take advantage of the fact that all GPU
tests have the sniffer binary as entrypoint, which slightly overloads the
role of the sniffer within the GPU test infrastructure... but then again
the ioctl sniffer is already deeply intertwined with ld configuration
because it already overrides the ioctl libc function, so this doesn't
seem like too big of a stretch.

This change makes the ffmpeg test succeed with runc on COS, but they still
fail with gVisor (with CUDA_ERROR_OUT_OF_MEMORY errors). So there must be
some further gVisor-specific error.

Updates #11351
Updates #11321

@copybara-service copybara-service bot added the exported Issue was exported automatically label Jan 13, 2025
@copybara-service copybara-service bot force-pushed the test/cl715106144 branch 3 times, most recently from 2572476 to be8196a Compare January 14, 2025 01:21
@copybara-service copybara-service bot changed the title DO NOT SUBMIT: Debugging COS GPU testing pipeline On COS, add NVIDIA library directory to LD configuration and update cache. Jan 14, 2025
…ache.

Unlike Ubuntu VMs where we use Docker's `--gpus` flag, COS VMs do not use
this flag and instead mount the NVIDIA library directories automatically.
However, nothing guarantees that these directories are added to the LD
config. This change fixes that. It take advantage of the fact that all GPU
tests have the sniffer binary as entrypoint, which slightly overloads the
role of the sniffer within the GPU test infrastructure... but then again
the ioctl sniffer is already deeply intertwined with ld configuration
because it already overrides the `ioctl` libc function, so this doesn't
seem like too big of a stretch.

This change makes the ffmpeg test succeed with `runc` on COS, but they still
fail with gVisor (with `CUDA_ERROR_OUT_OF_MEMORY` errors). So there must be
some further gVisor-specific error.

Updates #11351
Updates #11321

PiperOrigin-RevId: 715222952
@copybara-service copybara-service bot merged commit 3649ca9 into master Jan 14, 2025
@copybara-service copybara-service bot deleted the test/cl715106144 branch January 14, 2025 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exported Issue was exported automatically
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant