compatible with pytorch and cuda 12.4 #1056

collyyang520 · 2025-01-12T16:39:55Z

Hello everyone:
I am Yang Ren shu. I recently encountered a version adjustment problem. Specifically, I encountered this problem
RuntimeError: NVML_SUCCESS == DriverAPI::get()->nvmlInit_v2_() INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":963, please report a bug to PyTorch.
I have found related articles. It indicates that sudo reboot is required, but I am worried that someone is using or needs this version. I would like to ask how you adjusted it. Thank you

I found that pytorch only supports cuda up to 12.4, but I originally installed 12.6. I want to change it directly to

wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
sudo sh cuda_12.4.0_550.54.14_linux.run,

but I am worried about affecting other people, and the original conda install pytorch torchvision torchaudio cudatoolkit=12.4 -c pytorch is installed locally, but it cannot be installed successfully.

My worry is that I am in workstation, and I am in a virtual environment. I am fear that my change cuda will make other unhappy. I only want to run the pytorch with GPU to accelate.
By the way, I also a question:

/home/nthuuser/miniconda3/envs/earthquake/lib/python3.12/site-packages/torch/cuda/__init__.py:716: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")

Could someone help me?　　These two problem really make me frustrate.
Thanks in advanced.

The text was updated successfully, but these errors were encountered:

Bhazantri · 2025-01-23T18:29:34Z

CUDA Version Mismatch: PyTorch in your setup supports CUDA up to 12.4, but the workstation has CUDA 12.6 installed. This mismatch causes the RuntimeError: NVML_SUCCESS == DriverAPI::get()->nvmlInit_v2_() issue, as PyTorch requires a compatible CUDA runtime to utilize the GPU. Directly downgrading the system-wide CUDA version could impact other users on the shared workstation.

NVML Initialization Failure: The warning Can't initialize NVML suggests that PyTorch cannot interface with the NVIDIA driver properly. This can stem from driver-toolkit mismatches or configuration issues, potentially leading to degraded GPU functionality like monitoring or resource allocation.

conda create -n pytorch_env python=3.12 -y
conda activate pytorch_env

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
import torch
print(torch.cuda.is_available()) # True
print(torch.version.cuda) # Should match installed version, e.g., 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Additional :/
Use nvidia-smi to ensure the NVIDIA driver supports the desired CUDA version. For instance, driver version 525.60.13 or higher is needed for CUDA 12.x.
Use CUDA_VISIBLE_DEVICES to allocate specific GPUs to your process if multiple users are sharing the same hardware:
export CUDA_VISIBLE_DEVICES=0 # Assigns only GPU 0 to your process

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compatible with pytorch and cuda 12.4 #1056

compatible with pytorch and cuda 12.4 #1056

collyyang520 commented Jan 12, 2025 •

edited

Loading

Bhazantri commented Jan 23, 2025

compatible with pytorch and cuda 12.4 #1056

compatible with pytorch and cuda 12.4 #1056

Comments

collyyang520 commented Jan 12, 2025 • edited Loading

Bhazantri commented Jan 23, 2025

collyyang520 commented Jan 12, 2025 •

edited

Loading