Skip to content

Commit

Permalink
Add support in Helm and docs
Browse files Browse the repository at this point in the history
Signed-off-by: Yury Kulazhenkov <[email protected]>
  • Loading branch information
ykulazhenkov committed Sep 20, 2023
1 parent af9253d commit 5054f10
Show file tree
Hide file tree
Showing 7 changed files with 117 additions and 38 deletions.
48 changes: 31 additions & 17 deletions deployment/network-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -373,23 +373,37 @@ imagePullSecrets:

#### Mellanox OFED driver

| Name | Type | Default | description |
| ---- | ---- | ------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ofedDriver.deploy` | bool | `false` | deploy Mellanox OFED driver container |
| `ofedDriver.repository` | string | `mellanox` | Mellanox OFED driver image repository |
| `ofedDriver.image` | string | `mofed` | Mellanox OFED driver image name |
| `ofedDriver.version` | string | `5.9-0.5.6.0` | Mellanox OFED driver version |
| `ofedDriver.imagePullSecrets` | list | `[]` | An optional list of references to secrets to use for pulling any of the Mellanox OFED driver image |
| `ofedDriver.env` | list | `[]` | An optional list of [environment variables](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#envvar-v1-core) passed to the Mellanox OFED driver image |
| `ofedDriver.repoConfig.name` | string | `` | Private mirror repository configuration configMap name |
| `ofedDriver.certConfig.name` | string | `` | Custom TLS key/certificate configuration configMap name |
| `ofedDriver.terminationGracePeriodSeconds` | int | 300 | Mellanox OFED termination grace periods in seconds|
| `ofedDriver.startupProbe.initialDelaySeconds` | int | 10 | Mellanox OFED startup probe initial delay |
| `ofedDriver.startupProbe.periodSeconds` | int | 20 | Mellanox OFED startup probe interval |
| `ofedDriver.livenessProbe.initialDelaySeconds` | int | 30 | Mellanox OFED liveness probe initial delay |
| `ofedDriver.livenessProbe.periodSeconds` | int | 30 | Mellanox OFED liveness probe interval |
| `ofedDriver.readinessProbe.initialDelaySeconds` | int | 10 | Mellanox OFED readiness probe initial delay |
| `ofedDriver.readinessProbe.periodSeconds` | int | 30 | Mellanox OFED readiness probe interval |
| Name | Type | Default | description |
| -------------------------------------------------- | -------- | ----------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ofedDriver.deploy` | bool | `false` | deploy Mellanox OFED driver container |
| `ofedDriver.repository` | string | `mellanox` | Mellanox OFED driver image repository |
| `ofedDriver.image` | string | `mofed` | Mellanox OFED driver image name |
| `ofedDriver.version` | string | `5.9-0.5.6.0` | Mellanox OFED driver version |
| `ofedDriver.initContainer.enable` | bool | `true` | deploy init container |
| `ofedDriver.initContainer.repository` | string | `ghcr.io/mellanox` | init container image repository |
| `ofedDriver.initContainer.image` | string | `network-operator-init-container` | init container image name |
| `ofedDriver.initContainer.version` | string | `v0.0.1` | init container image version |
| `ofedDriver.imagePullSecrets` | list | `[]` | An optional list of references to secrets to use for pulling any of the Mellanox OFED driver image |
| `ofedDriver.env` | list | `[]` | An optional list of [environment variables](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#envvar-v1-core) passed to the Mellanox OFED driver image |
| `ofedDriver.repoConfig.name` | string | `` | Private mirror repository configuration configMap name |
| `ofedDriver.certConfig.name` | string | `` | Custom TLS key/certificate configuration configMap name |
| `ofedDriver.terminationGracePeriodSeconds` | int | 300 | Mellanox OFED termination grace periods in seconds |
| `ofedDriver.startupProbe.initialDelaySeconds` | int | 10 | Mellanox OFED startup probe initial delay |
| `ofedDriver.startupProbe.periodSeconds` | int | 20 | Mellanox OFED startup probe interval |
| `ofedDriver.livenessProbe.initialDelaySeconds` | int | 30 | Mellanox OFED liveness probe initial delay |
| `ofedDriver.livenessProbe.periodSeconds` | int | 30 | Mellanox OFED liveness probe interval |
| `ofedDriver.readinessProbe.initialDelaySeconds` | int | 10 | Mellanox OFED readiness probe initial delay |
| `ofedDriver.upgradePolicy.autoUpgrade` | bool | `false` | global switch for automatic upgrade feature |
| `ofedDriver.upgradePolicy.maxParallelUpgrades` | int | 1 | how many nodes can be upgraded in parallel, 0 means no limit, all nodes will be upgraded in parallel |
| `ofedDriver.upgradePolicy.safeLoad` | bool | `false` | cordon and drain (if enabled) a node before loading the driver on it |
| `ofedDriver.upgradePolicy.drain.enable` | bool | `true` | drain a node before the driver restart |
| `ofedDriver.upgradePolicy.drain.force` | bool | `false` | use force drain (check `kubectl drain` doc for details) |
| `ofedDriver.upgradePolicy.drain.podSelector` | string | "" | drain only pods matching this selector |
| `ofedDriver.upgradePolicy.drain.timeoutSeconds` | int | 300 | timeout for drain operation |
| `ofedDriver.upgradePolicy.drain.deleteEmptyDir` | bool | `false` | continue even if there are pods using emptyDir |
| `ofedDriver.upgradePolicy.waitForCompletion.podSelector` | string | not set | specifies a label selector for the pods to wait for completion before starting the driver upgrade |
| `ofedDriver.upgradePolicy.waitForCompletion.timeoutSeconds` | int | not set | specify the length of time in seconds to wait before giving up for workload to finish, zero means infinite |


#### RDMA Device Plugin

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,13 @@ spec:
image: {{ .Values.ofedDriver.image }}
repository: {{ .Values.ofedDriver.repository }}
version: {{ .Values.ofedDriver.version }}
{{- if .Values.ofedDriver.initContainer }}
initContainer:
enable: {{ .Values.ofedDriver.initContainer.enable }}
repository: {{ .Values.ofedDriver.initContainer.repository }}
image: {{ .Values.ofedDriver.initContainer.image }}
version: {{ .Values.ofedDriver.initContainer.version }}
{{- end }}
{{- if .Values.ofedDriver.env }}
env:
{{ toYaml .Values.ofedDriver.env | nindent 6 }}
Expand Down Expand Up @@ -59,12 +66,20 @@ spec:
upgradePolicy:
autoUpgrade: {{ .Values.ofedDriver.upgradePolicy.autoUpgrade | default false }}
maxParallelUpgrades: {{ .Values.ofedDriver.upgradePolicy.maxParallelUpgrades | default 0 }}
safeLoad: {{ .Values.ofedDriver.upgradePolicy.safeLoad | default false }}
{{- if .Values.ofedDriver.upgradePolicy.drain }}
drain:
enable: {{ .Values.ofedDriver.upgradePolicy.drain.enable | default true }}
force: {{ .Values.ofedDriver.upgradePolicy.drain.force | default false }}
podSelector: {{ .Values.ofedDriver.upgradePolicy.drain.podSelector | quote }}
timeoutSeconds: {{ .Values.ofedDriver.upgradePolicy.drain.timeoutSeconds }}
deleteEmptyDir: {{ .Values.ofedDriver.upgradePolicy.drain.deleteEmptyDir | default false}}
{{- end }}
{{- if .Values.ofedDriver.upgradePolicy.waitForCompletion }}
waitForCompletion:
podSelector: {{ .Values.ofedDriver.upgradePolicy.waitForCompletion.podSelector | default ""}}
timeoutSeconds: {{ .Values.ofedDriver.upgradePolicy.waitForCompletion.timeoutSeconds | default 0 }}
{{- end }}
{{- end }}
{{- end }}
{{- if .Values.rdmaSharedDevicePlugin.deploy }}
Expand Down
13 changes: 12 additions & 1 deletion deployment/network-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,11 @@ ofedDriver:
image: mofed
repository: nvcr.io/nvidia/mellanox
version: 23.07-0.5.0.0
initContainer:
enable: true
repository: ghcr.io/mellanox
image: network-operator-init-container
version: v0.0.1
# imagePullSecrets: []
# env, if defined will pass environment variables to the OFED container
# env:
Expand All @@ -166,7 +171,6 @@ ofedDriver:
# Custom ssl key/certificate configuration
certConfig:
name: ""

startupProbe:
initialDelaySeconds: 10
periodSeconds: 20
Expand All @@ -183,6 +187,8 @@ ofedDriver:
# how many nodes can be upgraded in parallel (default: 1)
# 0 means no limit, all nodes will be upgraded in parallel
maxParallelUpgrades: 1
# cordon and drain (if enabled) a node before loading the driver on it
safeLoad: false
# options for node drain (`kubectl drain`) before the driver reload
# if auto upgrade is enabled but drain.enable is false,
# then driver POD will be reloaded immediately without
Expand All @@ -194,6 +200,11 @@ ofedDriver:
# It's recommended to set a timeout to avoid infinite drain in case non-fatal error keeps happening on retries
timeoutSeconds: 300
deleteEmptyDir: false
waitForCompletion:
# specifies a label selector for the pods to wait for completion
# podSelector: "app=myapp"
# specify the length of time in seconds to wait before giving up for workload to finish, zero means infinite
# timeoutSeconds: 300

rdmaSharedDevicePlugin:
deploy: true
Expand Down
32 changes: 27 additions & 5 deletions docs/automatic-ofed-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ It is possible to do a driver upgrade manually by following the [manual upgrade
This document describes the automatic upgrade flow for the containerized OFED driver.

### Upgrade NVIDIA Mellanox OFED automatically
* Enable automatic MOFED upgrade, define UpgradePolicy section for ofedDriver in the [NicClusterPolicy spec:
* Enable automatic MOFED upgrade, define UpgradePolicy section for ofedDriver in the NicClusterPolicy spec:
```
apiVersion: mellanox.com/v1alpha1
kind: NicClusterPolicy
Expand All @@ -21,11 +21,13 @@ spec:
version: 5.6-1.0.3.3
upgradePolicy:
# autoUpgrade is a global switch for automatic upgrade feature
# if set to false all other options are ignored
# if set to false all other options are ignored
autoUpgrade: true
# maxParallelUpgrades indicates how many nodes can be upgraded in parallel
# 0 means no limit, all nodes will be upgraded in parallel
# 0 means no limit, all nodes will be upgraded in parallel
maxParallelUpgrades: 0
# cordon and drain (if enabled) a node before loading the driver on it
safeLoad: false
# describes the configuration for waiting on job completions
waitForCompletion:
# specifies a label selector for the pods to wait for completion
Expand All @@ -49,11 +51,31 @@ spec:
```
* Change ofedDriver version in the NicClusterPolicy
* To check if upgrade is finished, query the status of `state-OFED` in the [NicClusterPolicy status](https://github.com/Mellanox/network-operator#nicclusterpolicy-status)
* To track each node's upgrade status separately, run `kubectl describe node <node_name> | grep nvidia.com/ofed-upgrade-state`. See [Node upgrade states](#node-upgrade-states) section describing each state.
* To track each node's upgrade status separately, run `kubectl describe node <node_name> | grep nvidia.com/ofed-driver-upgrade-state`. See [Node upgrade states](#node-upgrade-states) section describing each state.

### Safe driver loading

The state of the feature can be controlled with `ofedDriver.upgradePolicy.safeLoad` option.

On Node startup, the OFED container takes some time to compile and load the driver.
During that time, workloads might get scheduled on that Node.
When OFED is loaded, all existing PODs that use NVIDIA NICs will lose their network interfaces.
Some such PODs might silently fail or hang.
To avoid such a situation, before the OFED container is loaded,
the Node should get Cordoned and Drained to ensure all workloads are rescheduled.
The Node should be un-cordoned when the driver is ready on it.

The safe driver loading feature is implemented as a part of the upgrade flow,
meaning safe driver loading is a special scenario of the upgrade procedure,
where we upgrade from the inbox driver to the containerized OFED.

When this feature is enabled, the initial OFED driver rollout on the large cluster can take much time.
To speed up the rollout, the initial deployment can be done with the safe driver loading feature disabled,
and this feature can be enabled later by updating NicClusterPolicy CR

### Details
#### Node upgrade states
Each node's upgrade status is reflected in its `nvidia.com/ofed-upgrade-state` label. This label can have the following values:
Each node's upgrade status is reflected in its `nvidia.com/ofed-driver-upgrade-state` label. This label can have the following values:
* Unknown (empty): node has this state when the upgrade flow is disabled or the node hasn't been processed yet
* `upgrade-done` is set when OFED POD is up to date and running on the node, the node is schedulable
UpgradeStateDone = "upgrade-done"
Expand Down
Loading

0 comments on commit 5054f10

Please sign in to comment.