Skip to content

Commit

Permalink
Merge branch 'clarify-kata' into 'master'
Browse files Browse the repository at this point in the history
Clarify labeling nodes

See merge request nvidia/cloud-native/cnt-docs!330
  • Loading branch information
mikemckiernan committed Sep 28, 2023
2 parents 5795e0c + bd90165 commit 208b01a
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 27 deletions.
48 changes: 34 additions & 14 deletions gpu-operator/gpu-operator-confidential-containers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,17 +258,6 @@ Installing and configuring your cluster to support the NVIDIA GPU Operator with
After installation, you can change the confidential computing mode and run a sample workload.


***************************************
Label Nodes for Confidential Containers
***************************************

> Label the nodes to run Kata Containers and configure for confidential containers:

.. code-block:: console
$ kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
.. start-install-coco-operator
********************************************
Expand Down Expand Up @@ -348,6 +337,10 @@ Perform the following steps to install and verify the Confidential Containers Op
nvidia.com/gpu.workload.config: "vm-passthrough"
...
.. tip::

Label the nodes with ``vm-passthrough`` when you install the NVIDIA GPU Operator.

#. Apply the modified manifests:

.. code-block:: console
Expand Down Expand Up @@ -392,6 +385,14 @@ Procedure

Perform the following steps to install the Operator for use with confidential containers:

#. Label the nodes to run virtual machines in containers.
Label only the nodes that you want to run with confidental containers.

.. code-block:: console
$ kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
#. Add and update the NVIDIA Helm repository:

.. code-block:: console
Expand Down Expand Up @@ -472,7 +473,7 @@ Verification
nvidia nvidia 97s
#. (Optional) If you have host access to the worker node, you can perform the following steps:
#. Optional: If you have host access to the worker node, you can perform the following steps:

#. Confirm that the host uses the ``vfio-pci`` device driver for GPUs:

Expand Down Expand Up @@ -639,6 +640,25 @@ A pod specification for a confidential computing requires the following:
Refer to :ref:`About the Pod Annotation` for information about the pod annotation.

Troubleshooting Workloads
=========================

If the sample workload does not run, confirm that you labelled nodes to run virtual machines in containers:

.. code-block:: console
$ kubectl get nodes -l nvidia.com/gpu.workload.config=vm-passthrough
*Example Output*

.. code-block:: output
NAME STATUS ROLES AGE VERSION
kata-worker-1 Ready <none> 10d v1.27.3
kata-worker-2 Ready <none> 10d v1.27.3
kata-worker-3 Ready <none> 10d v1.27.3
***********
Attestation
Expand Down Expand Up @@ -761,8 +781,8 @@ After you access the VM, you can run the following commands to verify that attes
True
Troubleshooting
===============
Troubleshooting Attestation
===========================

To troubleshoot attestation failures, access the VM and view the logs in the ``/var/log/`` directory.

Expand Down
42 changes: 29 additions & 13 deletions gpu-operator/gpu-operator-kata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -237,17 +237,6 @@ Installing and configuring your cluster to support the NVIDIA GPU Operator with
After installation, you can run a sample workload.


*******************************
Label Nodes for Kata Containers
*******************************

> Label the nodes to run Kata Containers:

.. code-block:: console
$ kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
.. include:: gpu-operator-confidential-containers.rst
:start-after: start-install-coco-operator
:end-before: end-install-coco-operator
Expand All @@ -262,6 +251,13 @@ Procedure

Perform the following steps to install the Operator for use with Kata Containers:

#. Label the nodes to run virtual machines in containers.
Label only the nodes that you want to run with Kata Containers.

.. code-block:: console
$ kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
#. Add and update the NVIDIA Helm repository:

.. code-block:: console
Expand Down Expand Up @@ -338,7 +334,7 @@ Verification
nvidia nvidia 97s
#. (Optional) If you have host access to the worker node, you can perform the following steps:
#. Optional: If you have host access to the worker node, you can perform the following steps:

#. Confirm that the host uses the ``vfio-pci`` device driver for GPUs:

Expand Down Expand Up @@ -450,8 +446,28 @@ A pod specification for a Kata container requires the following:
$ kubectl delete -f cuda-vectoradd-kata.yaml
Troubleshooting Workloads
=========================

If the sample workload does not run, confirm that you labelled nodes to run virtual machines in containers:

.. code-block:: console
$ kubectl get nodes -l nvidia.com/gpu.workload.config=vm-passthrough
*Example Output*

.. code-block:: output
NAME STATUS ROLES AGE VERSION
kata-worker-1 Ready <none> 10d v1.27.3
kata-worker-2 Ready <none> 10d v1.27.3
kata-worker-3 Ready <none> 10d v1.27.3
************************
About the Pod Annotation
========================
************************

The ``cdi.k8s.io/gpu: "nvidia.com/pgpu=0"`` annotation is used when the pod sandbox is created.
The annotation ensures that the virtual machine created by the Kata runtime is created with
Expand Down

0 comments on commit 208b01a

Please sign in to comment.