From 3dd114c7fe6e21c6fbd4bc6cb0ca12049da560ff Mon Sep 17 00:00:00 2001 From: Mike McKiernan Date: Mon, 23 Sep 2024 11:28:41 -0400 Subject: [PATCH 1/3] Describe pre-installed experience Signed-off-by: Mike McKiernan --- gpu-operator/getting-started.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/gpu-operator/getting-started.rst b/gpu-operator/getting-started.rst index dbdb53c29..47e70b250 100644 --- a/gpu-operator/getting-started.rst +++ b/gpu-operator/getting-started.rst @@ -347,6 +347,12 @@ In this scenario, the NVIDIA GPU driver is already installed on the worker nodes nvidia/gpu-operator \ --set driver.enabled=false +The preceding command prevents the Operator from installing the GPU driver on any nodes in the cluster. +If any nodes in the cluster have the GPU driver pre-installed, the GPU driver pod detects the kernel and exits. +The Operator proceeds to start other pods, such as the container toolkit pod. + +If all the nodes in the cluster have the GPU driver pre-installed, the Operator detects that all GPU driver pods exited and stops the GPU driver daemon set, +regardless of the ``driver.enabled`` value. .. _preinstalled-drivers-and-toolkit: From 2e9d55871ae1e3e9b419e2cae1ba7215becca311 Mon Sep 17 00:00:00 2001 From: Mike McKiernan Date: Tue, 24 Sep 2024 11:17:48 -0400 Subject: [PATCH 2/3] Review comments - The Operator labels nodes with a pre-installed driver. - Remove mention of stopping the daemon set. Signed-off-by: Mike McKiernan --- gpu-operator/getting-started.rst | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gpu-operator/getting-started.rst b/gpu-operator/getting-started.rst index 47e70b250..fc30d5796 100644 --- a/gpu-operator/getting-started.rst +++ b/gpu-operator/getting-started.rst @@ -348,11 +348,9 @@ In this scenario, the NVIDIA GPU driver is already installed on the worker nodes --set driver.enabled=false The preceding command prevents the Operator from installing the GPU driver on any nodes in the cluster. -If any nodes in the cluster have the GPU driver pre-installed, the GPU driver pod detects the kernel and exits. -The Operator proceeds to start other pods, such as the container toolkit pod. -If all the nodes in the cluster have the GPU driver pre-installed, the Operator detects that all GPU driver pods exited and stops the GPU driver daemon set, -regardless of the ``driver.enabled`` value. +If you do not specify the ``driver.enabled=false`` argument and nodes in the cluster have a pre-installed GPU driver, the GPU driver pod detects the driver kernel module and exits. +The Operator labels the nodes with ``nvidia.com/gpu.deploy.driver=preinstalled`` and proceeds to start other pods, such as the container toolkit pod. .. _preinstalled-drivers-and-toolkit: From 45348283d457f234210b3a1449060468bc79bc91 Mon Sep 17 00:00:00 2001 From: Mike McKiernan Date: Wed, 25 Sep 2024 14:41:43 -0400 Subject: [PATCH 3/3] More detail from Chris Signed-off-by: Mike McKiernan --- gpu-operator/getting-started.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gpu-operator/getting-started.rst b/gpu-operator/getting-started.rst index fc30d5796..3cc518389 100644 --- a/gpu-operator/getting-started.rst +++ b/gpu-operator/getting-started.rst @@ -349,8 +349,8 @@ In this scenario, the NVIDIA GPU driver is already installed on the worker nodes The preceding command prevents the Operator from installing the GPU driver on any nodes in the cluster. -If you do not specify the ``driver.enabled=false`` argument and nodes in the cluster have a pre-installed GPU driver, the GPU driver pod detects the driver kernel module and exits. -The Operator labels the nodes with ``nvidia.com/gpu.deploy.driver=preinstalled`` and proceeds to start other pods, such as the container toolkit pod. +If you do not specify the ``driver.enabled=false`` argument and nodes in the cluster have a pre-installed GPU driver, the init container in the driver pod detects that the driver is preinstalled and labels the node so that the driver pod is terminated and does not get re-scheduled on to the node. +The Operator proceeds to start other pods, such as the container toolkit pod. .. _preinstalled-drivers-and-toolkit: