-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RedHat9.2 exec k8s-driver-manager error #37
Comments
@cdesiniotis Have you seen this error? |
@lengrongfu I am not familiar. It is recommended to blacklist nouveau as it can conflict with the nvidia driver. |
I am using gpu-operator to install the driver. Do I need to manually add nouveau to the blacklist before installing gpu-operator? k8s-driver-manager pod exec k8s-driver-manager/driver-manager Line 494 in 659892a
|
This is not a required pre-requisite, but because you are seeing errors from nouveau I recommended that you try blacklisting it. Like you pointing out, we do take care of unloaded in the module. |
Ok, thanks, i exec blacklist nouveau after, |
@cdesiniotis Let's discuss whether it is possible to develop a new feature to add an option to k8s-driver-manager to perform the operation of |
Since blacklisting would require updating the initramfs and rebooting the node, it is not something we would be open to adding to this component. This should be done during infrastructure provisioning. |
I use
gpu-operator:v23.9.0
to install nvidia gpu driver, butnvidia-driver-daemonset
pod start after, the machine will kernel crash.I use GPU car is
Tesla P4
.os info:
Red Hat9.2
, kernel version is5.14.0-284.11.1.el9_2.x86_64
.machine is install
nouveau
driver, and i usedmesg
command to look kernel log, found having many error aboutnouveau
:The text was updated successfully, but these errors were encountered: