Merge branch 'clarify-kata' into 'master'

Clarify labeling nodes See merge request nvidia/cloud-native/cnt-docs!330
NVIDIA · Sep 28, 2023 · 208b01a · 208b01a
2 parents 5795e0c + bd90165
commit 208b01a
Show file tree

Hide file tree

Showing 2 changed files with 63 additions and 27 deletions.
diff --git a/gpu-operator/gpu-operator-confidential-containers.rst b/gpu-operator/gpu-operator-confidential-containers.rst
@@ -258,17 +258,6 @@ Installing and configuring your cluster to support the NVIDIA GPU Operator with
 After installation, you can change the confidential computing mode and run a sample workload.
 
 
-***************************************
-Label Nodes for Confidential Containers
-***************************************
-
-> Label the nodes to run Kata Containers and configure for confidential containers:
-
-  .. code-block:: console
-
-     $ kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
-
-
 .. start-install-coco-operator
 
 ********************************************
@@ -348,6 +337,10 @@ Perform the following steps to install and verify the Confidential Containers Op
                nvidia.com/gpu.workload.config: "vm-passthrough"
          ...
 
+      .. tip::
+
+         Label the nodes with ``vm-passthrough`` when you install the NVIDIA GPU Operator.
+
    #. Apply the modified manifests:
 
       .. code-block:: console
@@ -392,6 +385,14 @@ Procedure
 
 Perform the following steps to install the Operator for use with confidential containers:
 
+#. Label the nodes to run virtual machines in containers.
+   Label only the nodes that you want to run with confidental containers.
+
+   .. code-block:: console
+
+      $ kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
+
+
 #. Add and update the NVIDIA Helm repository:
 
    .. code-block:: console
@@ -472,7 +473,7 @@ Verification
       nvidia                     nvidia                     97s
 
 
-#. (Optional) If you have host access to the worker node, you can perform the following steps:
+#. Optional: If you have host access to the worker node, you can perform the following steps:
 
    #. Confirm that the host uses the ``vfio-pci`` device driver for GPUs:
 
@@ -639,6 +640,25 @@ A pod specification for a confidential computing requires the following:
 
 Refer to :ref:`About the Pod Annotation` for information about the pod annotation.
 
+Troubleshooting Workloads
+=========================
+
+If the sample workload does not run, confirm that you labelled nodes to run virtual machines in containers:
+
+.. code-block:: console
+
+   $ kubectl get nodes -l nvidia.com/gpu.workload.config=vm-passthrough
+
+*Example Output*
+
+.. code-block:: output
+
+   NAME               STATUS   ROLES    AGE   VERSION
+   kata-worker-1      Ready    <none>   10d   v1.27.3
+   kata-worker-2      Ready    <none>   10d   v1.27.3
+   kata-worker-3      Ready    <none>   10d   v1.27.3
+
+
 
 ***********
 Attestation
@@ -761,8 +781,8 @@ After you access the VM, you can run the following commands to verify that attes
          True
 
 
-Troubleshooting
-===============
+Troubleshooting Attestation
+===========================
 
 To troubleshoot attestation failures, access the VM and view the logs in the ``/var/log/`` directory.
 

diff --git a/gpu-operator/gpu-operator-kata.rst b/gpu-operator/gpu-operator-kata.rst
@@ -237,17 +237,6 @@ Installing and configuring your cluster to support the NVIDIA GPU Operator with
 After installation, you can run a sample workload.
 
 
-*******************************
-Label Nodes for Kata Containers
-*******************************
-
-> Label the nodes to run Kata Containers:
-
-  .. code-block:: console
-
-     $ kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
-
-
 .. include:: gpu-operator-confidential-containers.rst
    :start-after: start-install-coco-operator
    :end-before: end-install-coco-operator
@@ -262,6 +251,13 @@ Procedure
 
 Perform the following steps to install the Operator for use with Kata Containers:
 
+#. Label the nodes to run virtual machines in containers.
+   Label only the nodes that you want to run with Kata Containers.
+
+   .. code-block:: console
+
+      $ kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
+
 #. Add and update the NVIDIA Helm repository:
 
    .. code-block:: console
@@ -338,7 +334,7 @@ Verification
       nvidia                     nvidia                     97s
 
 
-#. (Optional) If you have host access to the worker node, you can perform the following steps:
+#. Optional: If you have host access to the worker node, you can perform the following steps:
 
    #. Confirm that the host uses the ``vfio-pci`` device driver for GPUs:
 
@@ -450,8 +446,28 @@ A pod specification for a Kata container requires the following:
       $ kubectl delete -f cuda-vectoradd-kata.yaml
 
 
+Troubleshooting Workloads
+=========================
+
+If the sample workload does not run, confirm that you labelled nodes to run virtual machines in containers:
+
+.. code-block:: console
+
+   $ kubectl get nodes -l nvidia.com/gpu.workload.config=vm-passthrough
+
+*Example Output*
+
+.. code-block:: output
+
+   NAME               STATUS   ROLES    AGE   VERSION
+   kata-worker-1      Ready    <none>   10d   v1.27.3
+   kata-worker-2      Ready    <none>   10d   v1.27.3
+   kata-worker-3      Ready    <none>   10d   v1.27.3
+
+
+************************
 About the Pod Annotation
-========================
+************************
 
 The ``cdi.k8s.io/gpu: "nvidia.com/pgpu=0"`` annotation is used when the pod sandbox is created.
 The annotation ensures that the virtual machine created by the Kata runtime is created with