fix: CUDA API bottleneck on newer CUDA versions on Linux #482

Tom94 · 2025-01-03T11:57:13Z

On newer CUDA versions (Linux only), cudaGetDeviceProperties became quite a bit slower. In some downstream applications, such as instant NGP, this can lead to as much as 3x slower training (!) on high-end GPUs.

To bypass the issue, this PR caches the result of cudaGetDeviceProperties such that it only needs to be called once for each device.

fix: CUDA API bottleneck on newer CUDA versions on Linux

c1423e1

Tom94 merged commit 0b85840 into master Jan 3, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: CUDA API bottleneck on newer CUDA versions on Linux #482

fix: CUDA API bottleneck on newer CUDA versions on Linux #482

Tom94 commented Jan 3, 2025

fix: CUDA API bottleneck on newer CUDA versions on Linux #482

fix: CUDA API bottleneck on newer CUDA versions on Linux #482

Conversation

Tom94 commented Jan 3, 2025