You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi Team , I am having an issue with RTX4090 and Fedora41. It was working fine since implementation until during an embedding model work for document inference from a container(running in gpu), went into issues as below, fan speed pretty high, but overall temp didn’t exceed 65C(this temp was only seen at this system at the time of issue, temp is normally 24C.
The container runs small Embedding model for embedding documents into a vector database. Same type of loads runs pretty normal at a T4 or A10G.
No monitor attached.
root@fedora41:~# nvidia-smi
Unable to determine the device handle for GPU0000:01:00.0: Unknown Error
INFO[0000] Using /usr/lib64/libnvidia-ml.so.565.77
INFO[0000] Using /usr/lib64/libnvidia-sandboxutils.so.565.77
INFO[0000] Auto-detected mode as ‘nvml’
INFO[0000] Using driver version 565.77
WARN[0000] Could not locate /dev/nvidia-modeset: pattern /dev/nvidia-modeset not found
INFO[0000] Selecting /dev/nvidia-uvm-tools as /dev/nvidia-uvm-tools
INFO[0000] Selecting /dev/nvidia-uvm as /dev/nvidia-uvm
INFO[0000] Selecting /dev/nvidiactl as /dev/nvidiactl
INFO[0000] Selecting /usr/lib64/libnvidia-egl-gbm.so.1.1.2 as /usr/lib64/libnvidia-egl-gbm.so.1.1.2
INFO[0000] Selecting /usr/lib64/libnvidia-egl-wayland.so.1.1.17 as /usr/lib64/libnvidia-egl-wayland.so.1.1.17
INFO[0000] Selecting /usr/lib64/libnvidia-allocator.so.565.77 as /usr/lib64/libnvidia-allocator.so.565.77
WARN[0000] Could not locate libnvidia-vulkan-producer.so.565.77: pattern libnvidia-vulkan-producer.so.565.77 not found
libnvidia-vulkan-producer.so.565.77: not found
INFO[0000] Selecting /usr/lib64/xorg/modules/drivers/nvidia_drv.so as /usr/lib64/xorg/modules/drivers/nvidia_drv.so
INFO[0000] Selecting /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so.565.77 as /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so.565.77
INFO[0000] Selecting /usr/share/glvnd/egl_vendor.d/10_nvidia.json as /usr/share/glvnd/egl_vendor.d/10_nvidia.json
INFO[0000] Selecting /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json as /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
INFO[0000] Selecting /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json as /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
INFO[0000] Selecting /usr/share/nvidia/nvoptix.bin as /usr/share/nvidia/nvoptix.bin
WARN[0000] Could not locate X11/xorg.conf.d/10-nvidia.conf: pattern X11/xorg.conf.d/10-nvidia.conf not found
INFO[0000] Selecting /usr/share/X11/xorg.conf.d/nvidia-drm-outputclass.conf as /usr/share/X11/xorg.conf.d/nvidia-drm-outputclass.conf
INFO[0000] Selecting /etc/vulkan/icd.d/nvidia_icd.json as /etc/vulkan/icd.d/nvidia_icd.json
WARN[0000] Could not locate vulkan/icd.d/nvidia_layers.json: pattern vulkan/icd.d/nvidia_layers.json not found
pattern vulkan/icd.d/nvidia_layers.json not found
INFO[0000] Selecting /etc/vulkan/implicit_layer.d/nvidia_layers.json as /etc/vulkan/implicit_layer.d/nvidia_layers.json
INFO[0000] Selecting /usr/lib64/libEGL_nvidia.so.565.77 as /usr/lib64/libEGL_nvidia.so.565.77
INFO[0000] Selecting /usr/lib64/libGLESv1_CM_nvidia.so.565.77 as /usr/lib64/libGLESv1_CM_nvidia.so.565.77
INFO[0000] Selecting /usr/lib64/libGLESv2_nvidia.so.565.77 as /usr/lib64/libGLESv2_nvidia.so.565.77
INFO[0000] Selecting /usr/lib64/libGLX_nvidia.so.565.77 as /usr/lib64/libGLX_nvidia.so.565.77
INFO[0000] Selecting /usr/lib64/libcuda.so.565.77 as /usr/lib64/libcuda.so.565.77
INFO[0000] Selecting /usr/lib64/libcudadebugger.so.565.77 as /usr/lib64/libcudadebugger.so.565.77
INFO[0000] Selecting /usr/lib64/libnvcuvid.so.565.77 as /usr/lib64/libnvcuvid.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-allocator.so.565.77 as /usr/lib64/libnvidia-allocator.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-cfg.so.565.77 as /usr/lib64/libnvidia-cfg.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-eglcore.so.565.77 as /usr/lib64/libnvidia-eglcore.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-encode.so.565.77 as /usr/lib64/libnvidia-encode.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-fbc.so.565.77 as /usr/lib64/libnvidia-fbc.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-glcore.so.565.77 as /usr/lib64/libnvidia-glcore.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-glsi.so.565.77 as /usr/lib64/libnvidia-glsi.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-glvkspirv.so.565.77 as /usr/lib64/libnvidia-glvkspirv.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-gpucomp.so.565.77 as /usr/lib64/libnvidia-gpucomp.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-gtk2.so.565.77 as /usr/lib64/libnvidia-gtk2.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-gtk3.so.565.77 as /usr/lib64/libnvidia-gtk3.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-ml.so.565.77 as /usr/lib64/libnvidia-ml.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-ngx.so.565.77 as /usr/lib64/libnvidia-ngx.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-nvvm.so.565.77 as /usr/lib64/libnvidia-nvvm.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-opencl.so.565.77 as /usr/lib64/libnvidia-opencl.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-opticalflow.so.565.77 as /usr/lib64/libnvidia-opticalflow.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-pkcs11-openssl3.so.565.77 as /usr/lib64/libnvidia-pkcs11-openssl3.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-pkcs11.so.565.77 as /usr/lib64/libnvidia-pkcs11.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-ptxjitcompiler.so.565.77 as /usr/lib64/libnvidia-ptxjitcompiler.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-rtcore.so.565.77 as /usr/lib64/libnvidia-rtcore.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-sandboxutils.so.565.77 as /usr/lib64/libnvidia-sandboxutils.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-tls.so.565.77 as /usr/lib64/libnvidia-tls.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-vksc-core.so.565.77 as /usr/lib64/libnvidia-vksc-core.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-wayland-client.so.565.77 as /usr/lib64/libnvidia-wayland-client.so.565.77
INFO[0000] Selecting /usr/lib64/libnvoptix.so.565.77 as /usr/lib64/libnvoptix.so.565.77
INFO[0000] Selecting /usr/lib64/vdpau/libvdpau_nvidia.so.565.77 as /usr/lib64/vdpau/libvdpau_nvidia.so.565.77
WARN[0000] Could not locate /nvidia-persistenced/socket: pattern /nvidia-persistenced/socket not found
WARN[0000] Could not locate /nvidia-fabricmanager/socket: pattern /nvidia-fabricmanager/socket not found
WARN[0000] Could not locate /tmp/nvidia-mps: pattern /tmp/nvidia-mps not found
INFO[0000] Selecting /lib/firmware/nvidia/565.77/gsp_ga10x.bin as /lib/firmware/nvidia/565.77/gsp_ga10x.bin
INFO[0000] Selecting /lib/firmware/nvidia/565.77/gsp_tu10x.bin as /lib/firmware/nvidia/565.77/gsp_tu10x.bin
INFO[0000] Selecting /usr/bin/nvidia-smi as /usr/bin/nvidia-smi
INFO[0000] Selecting /usr/bin/nvidia-debugdump as /usr/bin/nvidia-debugdump
INFO[0000] Selecting /usr/bin/nvidia-persistenced as /usr/bin/nvidia-persistenced
INFO[0000] Selecting /usr/bin/nvidia-cuda-mps-control as /usr/bin/nvidia-cuda-mps-control
INFO[0000] Selecting /usr/bin/nvidia-cuda-mps-server as /usr/bin/nvidia-cuda-mps-server
INFO[0000] Generated CDI spec with version 0.8.0
hi Team , I am having an issue with RTX4090 and Fedora41. It was working fine since implementation until during an embedding model work for document inference from a container(running in gpu), went into issues as below, fan speed pretty high, but overall temp didn’t exceed 65C(this temp was only seen at this system at the time of issue, temp is normally 24C.
The container runs small Embedding model for embedding documents into a vector database. Same type of loads runs pretty normal at a T4 or A10G.
No monitor attached.
root@fedora41:~# nvidia-smi
root@fedora41:~# nvidia-debugdump --dumpall
/etc/modprobe.d# cat nvidia.conf
options nvidia NVreg_PreserveVideoMemoryAllocations=1
options nvidia-drm modeset=1 fbdev=1
nvidia-bug-report.log.gz (608.3 KB)
cdi-spec.yaml.tgz
user@fedora41:~$ nvidia-ctk cdi generate --device-name-strategy=uuid --output cdi-spec.yaml
nvidia-bug-report.log.gz
Thanks!
The text was updated successfully, but these errors were encountered: