You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was testing the cuda_ipc UCT (v1.8.0), and noticed that the cuda_ipc UCT caches an opened cuIpcMemHandle by default since the UCX_CUDA_IPC_CACHE option is defaulted to y. Although this didn't cause apparent problems in my test environment, I'm worried about potential undefined behaviors due to this implementation choice. CUDA documentation dictates that Calling cuMemFree on an exported memory region before calling cuIpcCloseMemHandle in the importing context will result in undefined behavior (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1ga8bd126fcff919a0c996b7640f197b79). By instrumenting the code I can confirm that CUDA memory allocated by UCX users is indeed freed before the IPC handle due to the caching mechanism when running the ucc_perftest's allgather/allreduce tests with two processes and cuda_ipc UCT. The perftest's allgather/allreduce pattern involves allocating a buffer, sending the buffer data to neighbor via cuda_ipc, and freeing the buffer in a loop. After the first loop, the opened handles are not closed while the allocated buffers are.
Since this hasn't caused any serious problems, I'm not marking it as a bug. But I think it'd be better to report this anyway for references and I'd like to hear your opinions on this.
The text was updated successfully, but these errors were encountered:
Hi,
I was testing the
cuda_ipc
UCT (v1.8.0), and noticed that thecuda_ipc
UCT caches an opened cuIpcMemHandle by default since theUCX_CUDA_IPC_CACHE
option is defaulted toy
. Although this didn't cause apparent problems in my test environment, I'm worried about potential undefined behaviors due to this implementation choice. CUDA documentation dictates thatCalling cuMemFree on an exported memory region before calling cuIpcCloseMemHandle in the importing context will result in undefined behavior
(https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1ga8bd126fcff919a0c996b7640f197b79). By instrumenting the code I can confirm that CUDA memory allocated by UCX users is indeed freed before the IPC handle due to the caching mechanism when running theucc_perftest
's allgather/allreduce tests with two processes andcuda_ipc
UCT. The perftest's allgather/allreduce pattern involves allocating a buffer, sending the buffer data to neighbor viacuda_ipc
, and freeing the buffer in a loop. After the first loop, the opened handles are not closed while the allocated buffers are.Since this hasn't caused any serious problems, I'm not marking it as a bug. But I think it'd be better to report this anyway for references and I'd like to hear your opinions on this.
The text was updated successfully, but these errors were encountered: