Skip to content

Commit

Permalink
Merge branch 'supp-matrix-23.9.1' into 'master'
Browse files Browse the repository at this point in the history
Operand versions for 23.9.1

See merge request nvidia/cloud-native/cnt-docs!349
  • Loading branch information
mikemckiernan committed Dec 8, 2023
2 parents fcb6bfc + d832b9a commit 235a3b9
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 129 deletions.
168 changes: 65 additions & 103 deletions gpu-operator/life-cycle-policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,106 +95,68 @@ The following table shows the operands and default operand versions that corresp
When post-release testing confirms support for newer versions of operands, these updates are identified as *recommended updates* to a GPU Operator version.
Refer to :ref:`Upgrading the GPU Operator` for more information.

.. list-table::
:header-rows: 1
:align: center

* - Release
- | NVIDIA
| GPU
| Driver
- | NVIDIA Driver
| Manager for K8s
- | NVIDIA
| Container
| Toolkit
- | NVIDIA Kubernetes
| Device Plugin
- DCGM Exporter
- | Node Feature
| Discovery
- | NVIDIA GPU Feature
| Discovery for Kubernetes
- | NVIDIA MIG Manager
| for Kubernetes
- DCGM
- | Validator for
| NVIDIA GPU Operator
- | NVIDIA KubeVirt
| GPU Device Plugin
- | NVIDIA vGPU
| Device Manager
- NVIDIA GDS Driver
- | NVIDIA Kata Manager
| for Kubernetes
- | NVIDIA Confidential
| Computing Manager
| for Kubernetes
* - v23.9.0
- | `535.129.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-129-03/index.html>`_ (recommended),
| `535.104.12 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-104-12/index.html>`_ (default),
| `525.147.05 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-525-147-05/index.html>`_,
| `470.223.02 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-470-223-02/index.html>`_
- `v0.6.4 <https://ngc.nvidia.com/catalog/containers/nvidia:cloud-native:k8s-driver-manager>`_
- `1.14.3 <https://github.com/NVIDIA/nvidia-container-toolkit/releases>`_
- `0.14.2 <https://github.com/NVIDIA/k8s-device-plugin/releases>`_
- `3.2.6-3.1.9 <https://github.com/NVIDIA/gpu-monitoring-tools/releases>`_
- v0.14.2
- `0.8.2 <https://github.com/NVIDIA/gpu-feature-discovery/releases>`_
- `0.5.5 <https://github.com/NVIDIA/mig-parted/tree/main/deployments/gpu-operator>`_
- `3.2.6-1 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`_,
- v23.9.0
- `v1.2.3 <https://github.com/NVIDIA/kubevirt-gpu-device-plugin>`_
- v0.2.4
- `2.16.1 <https://github.com/NVIDIA/gds-nvidia-fs/releases>`_
- v0.1.2
- v0.1.1

* - v23.6.1
- | `535.129.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-129-03/index.html>`_ (recommended),
| `535.104.05 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-104-05/index.html>`_ (default),
| `525.147.05 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-525-147-05/index.html>`_,
| `470.223.02 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-470-223-02/index.html>`_
- `v0.6.2 <https://ngc.nvidia.com/catalog/containers/nvidia:cloud-native:k8s-driver-manager>`_
- `1.13.4 <https://github.com/NVIDIA/nvidia-container-toolkit/releases>`_
- `0.14.1 <https://github.com/NVIDIA/k8s-device-plugin/releases>`_
- `3.1.8-3.1.5 <https://github.com/NVIDIA/gpu-monitoring-tools/releases>`_
- v0.13.1
- `0.8.1 <https://github.com/NVIDIA/gpu-feature-discovery/releases>`_
- `0.5.3 <https://github.com/NVIDIA/mig-parted/tree/main/deployments/gpu-operator>`_
- | `3.1.8-1 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`_ (default),
- v23.6.1
- `v1.2.2 <https://github.com/NVIDIA/kubevirt-gpu-device-plugin>`_
- v0.2.3
- `2.16.1 <https://github.com/NVIDIA/gds-nvidia-fs/releases>`_
- v0.1.0
- v0.1.0

* - v23.6.0
- | `535.129.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-129-03/index.html>`_ (recommended),
| `535.86.10 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-86-10/index.html>`_ (default),
| `525.147.05 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-525-147-05/index.html>`_,
| `470.223.02 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-470-223-02/index.html>`_
- `v0.6.2 <https://ngc.nvidia.com/catalog/containers/nvidia:cloud-native:k8s-driver-manager>`_
- `1.13.4 <https://github.com/NVIDIA/nvidia-container-toolkit/releases>`_
- `0.14.1 <https://github.com/NVIDIA/k8s-device-plugin/releases>`_
- `3.1.8-3.1.5 <https://github.com/NVIDIA/gpu-monitoring-tools/releases>`_
- v0.13.1
- `0.8.1 <https://github.com/NVIDIA/gpu-feature-discovery/releases>`_
- `0.5.3 <https://github.com/NVIDIA/mig-parted/tree/main/deployments/gpu-operator>`_
- | `3.1.8-1 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`_ (default),
- v23.6.0
- `v1.2.2 <https://github.com/NVIDIA/kubevirt-gpu-device-plugin>`_
- v0.2.3
- `2.16.1 <https://github.com/NVIDIA/gds-nvidia-fs/releases>`_
- v0.1.0
- v0.1.0

.. note::

- Driver version could be different with NVIDIA vGPU, as it depends on the driver
version downloaded from the `NVIDIA vGPU Software Portal <https://nvid.nvidia.com/dashboard/#/dashboard>`_.
- The GPU Operator is supported on all active NVIDIA datacenter production drivers.
Refer to `Supported Drivers and CUDA Toolkit Versions <https://docs.nvidia.com/datacenter/tesla/drivers/index.html#cuda-drivers>`_
for more information.
.. list-table::
:header-rows: 1

* - Component
- Version

* - NVIDIA GPU Operator
- v23.9.1

* - NVIDIA GPU Driver
- | `535.129.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-129-03/index.html>`_ (default),
| `525.147.05 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-525-147-05/index.html>`_,
| `470.223.02 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-470-223-02/index.html>`_,
* - NVIDIA Driver Manager for K8s
- `v0.6.5 <https://ngc.nvidia.com/catalog/containers/nvidia:cloud-native:k8s-driver-manager>`_

* - NVIDIA Container Toolkit
- `1.14.3 <https://github.com/NVIDIA/nvidia-container-toolkit/releases>`_

* - NVIDIA Kubernetes Device Plugin
- `0.14.3 <https://github.com/NVIDIA/k8s-device-plugin/releases>`_

* - DCGM Exporter
- `3.3.0-3.2.0 <https://github.com/NVIDIA/gpu-monitoring-tools/releases>`_

* - Node Feature Discovery
- v0.14.2

* - | NVIDIA GPU Feature Discovery
| for Kubernetes
- `0.8.2 <https://github.com/NVIDIA/gpu-feature-discovery/releases>`_

* - NVIDIA MIG Manager for Kubernetes
- `0.5.5 <https://github.com/NVIDIA/mig-parted/tree/main/deployments/gpu-operator>`_

* - DCGM
- `3.3.0-1 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`_

* - Validator for NVIDIA GPU Operator
- v23.9.1

* - NVIDIA KubeVirt GPU Device Plugin
- `v1.2.4 <https://github.com/NVIDIA/kubevirt-gpu-device-plugin>`_

* - NVIDIA vGPU Device Manager
- v0.2.4

* - NVIDIA GDS Driver
- `2.17.5 <https://github.com/NVIDIA/gds-nvidia-fs/releases>`_

* - NVIDIA Kata Manager for Kubernetes
- v0.1.2

* - | NVIDIA Confidential Computing
| Manager for Kubernetes
- v0.1.1

.. note::

- Driver version could be different with NVIDIA vGPU, as it depends on the driver
version downloaded from the `NVIDIA vGPU Software Portal <https://nvid.nvidia.com/dashboard/#/dashboard>`_.
- The GPU Operator is supported on all active NVIDIA datacenter production drivers.
Refer to `Supported Drivers and CUDA Toolkit Versions <https://docs.nvidia.com/datacenter/tesla/drivers/index.html#cuda-drivers>`_
for more information.
62 changes: 36 additions & 26 deletions gpu-operator/platform-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,27 @@ Platform Support

.. include:: life-cycle-policy.rst

Supported NVIDIA GPUs and Systems
---------------------------------
.. _supported-nvidia-gpus-and-systems:

Supported NVIDIA Data Center GPUs and Systems
---------------------------------------------

The following NVIDIA data center GPUs are supported on x86 based platforms:

.. tab-set::

.. tab-item:: Data Center A, H and L-series Products
.. tab-item:: GH-series Products

.. list-table::
:header-rows: 1

* - Product
- Architecture

* - NVIDIA GH200
- NVIDIA Grace Hopper

.. tab-item:: A, H and L-series Products

+-------------------------+---------------------------+
| Product | Architecture |
Expand Down Expand Up @@ -90,7 +103,7 @@ The following NVIDIA data center GPUs are supported on x86 based platforms:
* Hopper (H100) GPU is only supported on x86 servers.
* The GPU Operator supports DGX A100 with DGX OS 5.1+ and Red Hat OpenShift using Red Hat Core OS. For installation instructions, see :ref:`here <preinstalled-drivers-and-toolkit>` for DGX OS 5.1+ and :ref:`here <openshift-introduction>` for Red Hat OpenShift.

.. tab-item:: Data Center D,T and V-series Products
.. tab-item:: D,T and V-series Products

+-----------------------+------------------------+
| Product | Architecture |
Expand All @@ -106,7 +119,7 @@ The following NVIDIA data center GPUs are supported on x86 based platforms:
| NVIDIA P4 | Pascal |
+-----------------------+------------------------+

.. tab-item:: Data Center RTX / T-series Products
.. tab-item:: RTX / T-series Products

+-------------------------+------------------------+
| Product | Architecture |
Expand Down Expand Up @@ -244,29 +257,21 @@ The GPU Operator has been validated in the following scenarios:
| MicroK8s
* - Ubuntu 20.04 LTS
- 1.25---1.28
- 1.22---1.28
-
- 7.0 U3c, 8.0 U2
- 1.25---1.28
- 1.22---1.28
-
-

* - Ubuntu 22.04 LTS
- 1.25---1.28
- 1.22---1.28
-
-
-
-
- 1.26

* - CentOS 7
- 1.25---1.28
-
-
-
-
-

* - Red Hat Core OS
-
- | 4.9---4.14
Expand All @@ -279,10 +284,10 @@ The GPU Operator has been validated in the following scenarios:
| Enterprise
| Linux 8.4,
| 8.6---8.9
- 1.25---1.28
- 1.22---1.28
-
-
- 1.25---1.28
- 1.22---1.28
-
-

Expand Down Expand Up @@ -407,8 +412,8 @@ Operating System Kubernetes KubeVirt OpenShift Virtual
================ =========== ============= ========= ============= ========
Ubuntu 20.04 LTS 1.22---1.28 0.36+ 0.59.1+
Ubuntu 22.04 LTS 1.22---1.28 0.36+ 0.59.1+
Red Hat Core OS 4.11, 4.12, 4.13
4.13
Red Hat Core OS 4.11---4.14 4.13,
4.14
================ =========== ============= ========= ============= ========

You can run GPU passthrough and NVIDIA vGPU in the same cluster as long as you use
Expand All @@ -426,9 +431,8 @@ Support for GPUDirect RDMA

Supported operating systems and NVIDIA GPU Drivers with GPUDirect RDMA.

- Ubuntu 20.04 and 22.04 LTS with Network Operator 23.7.0
- Red Hat OpenShift 4.9 and higher with Network Operator 23.7.0
- CentOS 7 with MOFED installed on the node
- Ubuntu 20.04 and 22.04 LTS with Network Operator 23.10.0
- Red Hat OpenShift 4.9 and higher with Network Operator 23.10.0

For information about configuring GPUDirect RDMA, refer to :doc:`gpu-operator-rdma`.

Expand All @@ -438,13 +442,19 @@ Support for GPUDirect Storage

Supported operating systems and NVIDIA GPU Drivers with GPUDirect Storage.

- Ubuntu 20.04 and 22.04 LTS with Network Operator 23.7.0
- Ubuntu 20.04 and 22.04 LTS with Network Operator 23.10.0
- Red Hat OpenShift Container Platform 4.11 and higher

.. note::

Not supported with secure boot.
Supported storage types are local NVMe and remote NFS.
Version v2.17.5 and higher of the NVIDIA GPUDirect Storage kernel driver, ``nvidia-fs``,
requires the NVIDIA open kernel modules.
You can install the open kernel modules by specifying the ``driver.useOpenKernelModules=true``
argument to the ``helm`` command.
Refer to :ref:`chart customization options` for more information.

Not supported with secure boot.
Supported storage types are local NVMe and remote NFS.

Additional Supported Container Management Tools
-----------------------------------------------
Expand Down

0 comments on commit 235a3b9

Please sign in to comment.