Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/CUDA_IPC: Possible UB when enabling CUDA_IPC_CACHE #10346

Open
andylin-hao opened this issue Dec 2, 2024 · 0 comments
Open

UCT/CUDA_IPC: Possible UB when enabling CUDA_IPC_CACHE #10346

andylin-hao opened this issue Dec 2, 2024 · 0 comments

Comments

@andylin-hao
Copy link

Hi,

I was testing the cuda_ipc UCT (v1.8.0), and noticed that the cuda_ipc UCT caches an opened cuIpcMemHandle by default since the UCX_CUDA_IPC_CACHE option is defaulted to y. Although this didn't cause apparent problems in my test environment, I'm worried about potential undefined behaviors due to this implementation choice. CUDA documentation dictates that Calling cuMemFree on an exported memory region before calling cuIpcCloseMemHandle in the importing context will result in undefined behavior (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1ga8bd126fcff919a0c996b7640f197b79). By instrumenting the code I can confirm that CUDA memory allocated by UCX users is indeed freed before the IPC handle due to the caching mechanism when running the ucc_perftest's allgather/allreduce tests with two processes and cuda_ipc UCT. The perftest's allgather/allreduce pattern involves allocating a buffer, sending the buffer data to neighbor via cuda_ipc, and freeing the buffer in a loop. After the first loop, the opened handles are not closed while the allocated buffers are.

Since this hasn't caused any serious problems, I'm not marking it as a bug. But I think it'd be better to report this anyway for references and I'd like to hear your opinions on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant