Skip to content

Commit

Permalink
Merge pull request #141 from DimmestP/gpu-lesson-1-job-only
Browse files Browse the repository at this point in the history
Refine job-only gpuservice lesson 1
  • Loading branch information
agngrant authored Apr 9, 2024
2 parents f2ab068 + a39a975 commit b352e8e
Showing 1 changed file with 156 additions and 88 deletions.
244 changes: 156 additions & 88 deletions docs/services/gpuservice/training/L1_getting_started.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
# Getting started with Kubernetes

## Requirements

In order to follow this tutorial on the EIDF GPU Cluster you will need to have:

- An account on the EIDF Portal.

- An active EIDF Project on the Portal with access to the EIDF GPU Service.

- The EIDF GPU Service kubernetes namespace associated with the project, e.g. eidf001ns.

- The EIDF GPU Service queue name associated with the project, e.g. eidf001ns-user-queue.

- Downloaded the kubeconfig file to a Project VM along with the kubectl command line tool to interact with the K8s API.

!!! Important "Downloading the kubeconfig file and kubectl"

Project Leads should use the 'Download kubeconfig' button on the EIDF Portal to complete this step to ensure the correct kubeconfig file and kubectl version is installed.

## Introduction

Kubernetes (K8s) is a container orchestration system, originally developed by Google, for the deployment, scaling, and management of containerised applications.
Expand All @@ -17,110 +35,126 @@ An overview of the key components of a K8s container can be seen on the [Kuberne

The primary component of a K8s cluster is a pod.

A pod is a set of one or more containers (and their storage volumes) that share resources.
A pod is a set of one or more docker containers (and their storage volumes) that share resources.

Users define the resource requirements of a pod (i.e. number/type of GPU) and the containers to be ran in the pod by writing a yaml file.
It is the EIDF GPU Cluster policy that all pods should be wrapped within a K8s [job](https://kubernetes.io/docs/concepts/workloads/controllers/job/).

The pod definition yaml file is sent to the cluster using the K8s API and is assigned to an appropriate node to be ran.
This allows GPU/CPU/Memory resource requests to be managed by the cluster queue management system, kueue.

A node is a part of the cluster such as a physical or virtual host which exposes CPU, Memory and GPUs.
Pods which attempt to bypass the queue mechanism will affect the experience of other project users.

Multiple pods can be defined and maintained using several different methods depending on purpose: [deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [services](https://kubernetes.io/docs/concepts/services-networking/service/) and [jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/); see the K8s docs for more details.
Any pods not associated with a job (or other K8s object) are at risk of being deleted without notice.

K8s jobs also provide additional functionality such as parallelism (described later in this tutorial).

Users define the resource requirements of a pod (i.e. number/type of GPU) and the containers/code to be ran in the pod by defining a template within a job manifest file written in yaml.

The job yaml file is sent to the cluster using the K8s API and is assigned to an appropriate node to be ran.

A node is a part of the cluster such as a physical or virtual host which exposes CPU, Memory and GPUs.

Users interact with the K8s API using the `kubectl` (short for kubernetes control) commands.

Some of the kubectl commands are restricted on the EIDF cluster in order to ensure project details are not shared across namespaces.

!!! important "Ensure kubectl is interacting with your project namespace."

You will need to pass the name of your project namespace to `kubectl` in order for it to have permission to interact with the cluster.

`kubectl` will attempt to interact with the `default` namespace which will return a permissions error if it is not told otherwise.

`kubectl -n <project-namespace> <command>` will tell kubectl to pass the commands to the correct namespace.

Useful commands are:

- `kubectl create -f <job definition yaml>`: Create a new job with requested resources. Returns an error if a job with the same name already exists.
- `kubectl apply -f <job definition yaml>`: Create a new job with requested resources. If a job with the same name already exists it updates that job with the new resource/container requirements outlined in the yaml.
- `kubectl delete pod <pod name>`: Delete a pod from the cluster.
- `kubectl get pods`: Summarise all pods the namespace has active (or pending).
- `kubectl describe pods`: Verbose description of all pods the namespace has active (or pending).
- `kubectl describe pod <pod name>`: Verbose summary of the specified pod.
- `kubectl logs <pod name>`: Retrieve the log files associated with a running pod.
- `kubectl get jobs`: List all jobs the namespace has active (or pending).
- `kubectl describe job <job name>`: Verbose summary of the specified job.
- `kubectl delete job <job name>`: Delete a job from the cluster.
- `kubectl -n <project-namespace> create -f <job definition yaml>`: Create a new job with requested resources. Returns an error if a job with the same name already exists.
- `kubectl -n <project-namespace> apply -f <job definition yaml>`: Create a new job with requested resources. If a job with the same name already exists it updates that job with the new resource/container requirements outlined in the yaml.
- `kubectl -n <project-namespace> delete pod <pod name>`: Delete a pod from the cluster.
- `kubectl -n <project-namespace> get pods`: Summarise all pods the namespace has active (or pending).
- `kubectl -n <project-namespace> describe pods`: Verbose description of all pods the namespace has active (or pending).
- `kubectl -n <project-namespace> describe pod <pod name>`: Verbose summary of the specified pod.
- `kubectl -n <project-namespace> logs <pod name>`: Retrieve the log files associated with a running pod.
- `kubectl -n <project-namespace> get jobs`: List all jobs the namespace has active (or pending).
- `kubectl -n <project-namespace> describe job <job name>`: Verbose summary of the specified job.
- `kubectl -n <project-namespace> delete job <job name>`: Delete a job from the cluster.

## Creating your first job
## Creating your first pod template within a job yaml file

To access the GPUs on the service, it is recommended to start with one of the prebuild container images provided by Nvidia, these images are intended to perform different tasks using Nvidia GPUs.
To access the GPUs on the service, it is recommended to start with one of the prebuilt container images provided by Nvidia, these images are intended to perform different tasks using Nvidia GPUs.

The list of Nvidia images is available on their [website](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/k8s/containers/cuda-sample/tags).

The following example uses their CUDA sample code simulating nbody interactions.

1. Open an editor of your choice and create the file test_NBody.yml
1. Copy the following in to the file, replacing `namespace-user-queue` with <your namespace>-user-queue, e.g. eidf001ns-user-queue:

``` yaml
apiVersion: batch/v1
kind: Job
metadata:
generateName: jobtest-
labels:
kueue.x-k8s.io/queue-name: namespace-user-queue
spec:
completions: 1
template:
metadata:
name: job-test
spec:
containers:
- name: cudasample
image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1
args: ["-benchmark", "-numbodies=512000", "-fp64", "-fullscreen"]
resources:
requests:
cpu: 2
memory: '1Gi'
limits:
cpu: 2
memory: '4Gi'
nvidia.com/gpu: 1
restartPolicy: Never
```
``` yaml
apiVersion: batch/v1
kind: Job
metadata:
generateName: jobtest-
labels:
kueue.x-k8s.io/queue-name: <project-namespace>-user-queue
spec:
completions: 1
template:
metadata:
name: job-test
spec:
containers:
- name: cudasample
image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1
args: ["-benchmark", "-numbodies=512000", "-fp64", "-fullscreen"]
resources:
requests:
cpu: 2
memory: '1Gi'
limits:
cpu: 2
memory: '4Gi'
nvidia.com/gpu: 1
restartPolicy: Never
```
The pod resources are defined under the `resources` tags using the `requests` and `limits` tags.

The pod resources are defined under the `resources` tags using the `requests` and `limits` tags.
Resources defined under the `requests` tags are the reserved resources required for the pod to be scheduled.

Resources defined under the `requests` tags are the reserved resources required for the pod to be scheduled.
If a pod is assigned to a node with unused resources then it may burst up to use resources beyond those requested.

If a pod is assigned to a node with unused resources then it may burst up to use resources beyond those requested.
This may allow the task within the pod to run faster, but it will also throttle back down when further pods are scheduled to the node.

This may allow the task within the pod to run faster, but it will also throttle back down when further pods are scheduled to the node.
The `limits` tag specifies the maximum resources that can be assigned to a pod.

The `limits` tag specifies the maximum resources that can be assigned to a pod.
The EIDF GPU Service requires all pods have `requests` and `limits` tags for CPU and memory defined in order to be accepted.

The EIDF GPU Service requires all pods have `requests` and `limits` tags for CPU and memory defined in order to be accepted.
GPU resources requests are optional and only an entry under the `limits` tag is needed to specify the use of a GPU, `nvidia.com/gpu: 1`. Without this no GPU will be available to the pod.

GPU resources requests are optional and only an entry under the `limits` tag is needed to specify the use of a GPU, `nvidia.com/gpu: 1`. Without this no GPU will be available to the pod.
The label `kueue.x-k8s.io/queue-name` specifies the queue you are submitting your job to. This is part of the Kueue system in operation on the service to allow for improved resource management for users.

The label `kueue.x-k8s.io/queue-name` specifies the queue you are submitting your job to. This is part of the Kueue system in operation on the service to allow for improved resource management for users.
## Submitting your first job

1. Open an editor of your choice and create the file test_NBody.yml
1. Copy the above job yaml in to the file, filling in `<project-namespace>-user-queue`, e.g. eidf001ns-user-queue:
1. Save the file and exit the editor
1. Run `kubectl create -f test_NBody.yml`
1. Run `kubectl -n <project-namespace> create -f test_NBody.yml`
1. This will output something like:

``` bash
job.batch/jobtest-b92qg created
```

1. Run `kubectl get jobs`
The five character code appended to the job name, i.e. `b92qg`, is randomly generated and will differ from your run.

1. Run `kubectl -n <project-namespace> get jobs`
1. This will output something like:

```bash
NAME COMPLETIONS DURATION AGE
jobtest-b92qg 3/3 48s 6m27s
jobtest-d45sr 5/5 15m 22h
jobtest-kwmwk 3/3 48s 29m
jobtest-kw22k 1/1 48s 29m
jobtest-b92qg 1/1 48s 29m
```

This displays all the jobs in the current namespace, starting with their name, number of completions against required completions, duration and age.
There may be more than one entry as this displays all the jobs in the current namespace, starting with their name, number of completions against required completions, duration and age.

1. Describe your job using the command `kubectl describe job jobtest-b92-qg`, replacing the job name with your job name.
1. Inspect your job further using the command `kubectl -n <project-namespace> describe job jobtest-b92qg`, updating the job name with your five character code.
1. This will output something like:

```bash
Expand Down Expand Up @@ -172,25 +206,18 @@ The following example uses their CUDA sample code simulating nbody interactions.
Normal Completed 7m12s job-controller Job completed
```

1. Run `kubectl get pods`
1. Run `kubectl -n <project-namespace> get pods`
1. This will output something like:

``` bash
NAME READY STATUS RESTARTS AGE
jobtest-b92qg-lh64s 0/1 Completed 0 11m
jobtest-b92qg-lvmrf 0/1 Completed 0 10m
jobtest-b92qg-xhvdm 0/1 Completed 0 10m
jobtest-d45sr-8tf4d 0/1 Completed 0 22h
jobtest-d45sr-jjhgg 0/1 Completed 0 22h
jobtest-d45sr-n5w6c 0/1 Completed 0 22h
jobtest-d45sr-v9p4j 0/1 Completed 0 22h
jobtest-d45sr-xgq5s 0/1 Completed 0 22h
jobtest-kwmwk-cgwmf 0/1 Completed 0 33m
jobtest-kwmwk-mttdw 0/1 Completed 0 33m
jobtest-kwmwk-r2q9h 0/1 Completed 0 33m
```

1. View the logs of a pod from the job you ran `kubectl logs jobtest-b92qg-lh64s` - note that the pods for the job in this case start with the job name.
Again, there may be more than one entry as this displays all the jobs in the current namespace.
Also, each pod within a job is given another unique 5 character code appended to the job name.

1. View the logs of a pod from the job you ran `kubectl -n <project-namespace> logs jobtest-b92qg-lh64s` - again update with you run's pod and job five letter code.
1. This will output something like:

``` bash
Expand Down Expand Up @@ -221,23 +248,31 @@ The following example uses their CUDA sample code simulating nbody interactions.
= 7439.679 double-precision GFLOP/s at 30 flops per interaction
```

1. Delete your job with `kubectl delete job jobtest-b92qg` - this will delete the associated pods as well.
1. Delete your job with `kubectl -n <project-namespace> delete job jobtest-b92qg` - this will delete the associated pods as well.

## Specifying GPU requirements

If you create multiple jobs with the same definition file and compare their log files you may notice the CUDA device may differ from `Compute 8.0 CUDA device: [NVIDIA A100-SXM4-40GB]`.

The GPU Operator on K8s is allocating the pod to the first node with a GPU free that matches the other resource specifications irrespective of whether what GPU type is present on the node.
The GPU Operator on K8s is allocating the pod to the first node with a GPU free that matches the other resource specifications irrespective of the type of GPU present on the node.

The GPU resource requests can be made more specific by adding the type of GPU product the pod is requesting to the node selector:
The GPU resource requests can be made more specific by adding the type of GPU product the pod template is requesting to the node selector:

- `nvidia.com/gpu.product: 'NVIDIA-A100-SXM4-80GB'`
- `nvidia.com/gpu.product: 'NVIDIA-A100-SXM4-40GB'`
- `nvidia.com/gpu.product: 'NVIDIA-A100-SXM4-40GB-MIG-3g.20gb'`
- `nvidia.com/gpu.product: 'NVIDIA-A100-SXM4-40GB-MIG-1g.5gb'`
- `nvidia.com/gpu.product: 'NVIDIA-H100-80GB-HBM3'`
### Example yaml file
### Example yaml file with GPU type specified
The `nodeSelector:` key at the bottom of the pod template states the pod should be ran on a node with a 1g.5gb MIG GPU.

!!! important "Exact GPU product names only"

K8s will fail to assign the pod if you misspell the GPU type.

Be especially careful when requesting a full 80Gb or 40Gb A100 GPU as attempting to load GPUs with more data than its memory can handle can have unexpected consequences.

```yaml
Expand All @@ -246,7 +281,7 @@ kind: Job
metadata:
generateName: jobtest-
labels:
kueue.x-k8s.io/queue-name: namespace-user-queue
kueue.x-k8s.io/queue-name: <project-namespace>-user-queue
spec:
completions: 1
template:
Expand All @@ -272,15 +307,11 @@ spec:

## Running multiple pods with K8s jobs

The recommended use of the EIDF GPU Service is to use a job request which wraps around a pod specification and provide several useful attributes.
Firstly, if a pod is assigned to a node that dies then the pod itself will fail and the user has to manually restart it.
Wrapping a pod within a job enables the self-healing mechanism within K8s so that if a node dies with the job's pod on it then the job will find a new node to automatically restart the pod, if the restartPolicy is set.
Wrapping a pod within a job provides additional functionality on top of accessing the queuing system.

Jobs allow users to define multiple pods that can run in parallel or series and will continue to spawn pods until a specific number of pods successfully terminate.
Firstly, the restartPolicy within a job enables the self-healing mechanism within K8s so that if a node dies with the job's pod on it then the job will find a new node to automatically restart the pod.

Jobs allow for better scheduling of resources using the Kueue service implemented on the EIDF GPU Service. Pods which attempt to bypass the queue mechanism this provides will affect the experience of other project users.
Jobs also allow users to define multiple pods that can run in parallel or series and will continue to spawn pods until a specific number of pods successfully terminate.

See below for an example K8s job that requires three pods to successfully complete the example CUDA code before the job itself ends.

Expand All @@ -290,7 +321,7 @@ kind: Job
metadata:
generateName: jobtest-
labels:
kueue.x-k8s.io/queue-name: namespace-user-queue
kueue.x-k8s.io/queue-name: <project-namespace>-user-queue
spec:
completions: 3
parallelism: 1
Expand All @@ -312,3 +343,40 @@ spec:
nvidia.com/gpu: 1
restartPolicy: Never
```

## Change the default kubectl namespace in the project kubeconfig file

Passing the `-n <project-namespace>` flag every time you want to interact with the cluster can be cumbersome.

You can alter the kubeconfig on your VM to send commands to your project namespace by default.

Only users with sudo privileges can change the root kubectl config file.

1. Open the command line on your EIDF VM with access to the EIDF GPU Service.

1. Open the root kubeconfig file with sudo privileges.

```bash
sudo nano /kubernetes/config
```

1. Add the namespace line with your project's kubernetes namespace to the "eidf-general-prod" context entry in your copy of the config file.

```txt
*** MORE CONFIG ***
contexts:
- name: "eidf-general-prod"
context:
user: "eidf-general-prod"
namespace: "<project-namespace>" # INSERT LINE
cluster: "eidf-general-prod"
*** MORE CONFIG ***
```

1. Check kubectl connects to the cluster. If this does not work you delete and re-download the kubeconfig file using the button on the project page of the EIDF portal.

```bash
kubectl get pods
```

0 comments on commit b352e8e

Please sign in to comment.