Expand 2nd section up to nodeSelector key

Signed-off-by: davidmirror-ops <[email protected]>
flyteorg · Jun 25, 2024 · 2f9cc5e · 2f9cc5e
1 parent fe51590
commit 2f9cc5e
Showing 1 changed file with 79 additions and 3 deletions.
diff --git a/docs/user_guide/productionizing/configuring_access_to_gpus.md b/docs/user_guide/productionizing/configuring_access_to_gpus.md
@@ -36,12 +36,12 @@ The goal here is to make a simple request of any available GPU device(s).
 
 ![](https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/deployment/gpus/generic_gpu_access.png)
 
-When this task is executed, `flyteproller` injects a [toleration](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) in the Pod spec:
+When this task is evaluated, `flyteproller` injects a [toleration](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) in the pod spec:
 
 ```yaml
 tolerations:    nvidia.com/gpu:NoSchedule op=Exists
 ```
-The Kubernetes scheduler will admit the pods if there are worker nodes with matching taints and available resources in the cluster.
+The Kubernetes scheduler will admit the pod if there are worker nodes with matching taints and available resources in the cluster.
 
 The resource `nvidia.com/gpu` key name is not arbitrary. It corresponds to the [Extended Resource](https://kubernetes.io/docs/tasks/administer-cluster/extended-resource-node/) that the Kubernetes worker nodes advertise to the API server through the [device plugin](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins).
 
@@ -99,10 +99,86 @@ configuration:
 
 ## Requesting a specific GPU device
 
-### Infrastructure requirements
+Example:
+```python
+from flytekit import ImageSpec, Resources, task
+from flytekit.extras.accelerators import V100
+
+image = ImageSpec(
+    base_image= "ghcr.io/flyteorg/flytekit:py3.10-1.10.2",
+     name="pytorch",
+     python_version="3.10",
+     packages=["torch"],
+     builder="envd",
+     registry="<YOUR_CONTAINER_REGISTRY>",
+ )
+
+@task(requests=Resources( gpu="1"),
+              accelerator=V100, 
+              ) #NVIDIA Tesla V100
+def gpu_available() -> bool:
+   return torch.cuda.is_available()
+```
+Leveraging a flytekit feature, you can specify the accelerator device in the task decorator .
 
 ### How it works?
 
+When this task is evaluated, `flytepropeller` injects both a toleration and a nodeSelector for a more flexible scheduling configuration.
+
+An example pod spec on GKE would include the following:
+
+```yaml
+apiVersion: v1
+kind: Pod
+spec:
+  affinity:
+    nodeAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+        nodeSelectorTerms:
+        - matchExpressions:
+          - key: cloud.google.com/gke-accelerator
+            operator: In
+            values:
+            - nvidia-tesla-v100
+  containers:
+  - resources:
+      limits:
+        nvidia.com/gpu: 1
+  tolerations:
+  - key: nvidia.com/gpu  # auto
+    operator: Equal
+    value: present
+    effect: NoSchedule
+  - key: cloud.google.com/gke-accelerator
+    operator: Equal
+    value: nvidia-tesla-v100
+    effect: NoSchedule
+```
+### Configuring the nodeSelector
+The `key` that the injected node selector uses corresponds to an arbitrary label that your Kubernetes worker nodes should apply. In the above example it's `cloud.google.com/gke-accelerator` but, depending on your cloud provider it could be any other value. You can inform Flyte about the labels your worker nodes use by adjusting the Helm values:
+
+**flyte-core**
+```yaml
+configmap:
+  k8s:
+    plugins:
+      k8s:
+        gpu-device-node-label: "cloud.google.com/gke-accelerator" #change to match your node's config
+```
+**flyte-binary**
+```yaml
+configuration:
+  inline:
+    plugins:
+      k8s:
+       gpu-device-node-label: "cloud.google.com/gke-accelerator" #change to match your node's config 
+```
+While the `key` is arbitrary the `value` is not. flytekit has a set of [predefined](https://docs.flyte.org/en/latest/api/flytekit/extras.accelerators.html#predefined-accelerator-constants) constants and your node label has to use one of those keys. 
+
+
+
+
+
 ## Requesting a GPU partition
 
 Flyte