Skip to content

Commit

Permalink
pytorchjob-generator: make namespace an optional value (#115)
Browse files Browse the repository at this point in the history
  • Loading branch information
dgrove-oss authored Dec 19, 2024
1 parent 58b9291 commit 956e682
Show file tree
Hide file tree
Showing 9 changed files with 40 additions and 68 deletions.
3 changes: 1 addition & 2 deletions tools/pytorchjob-generator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ mlbatch/pytorchjob-generator 1.1.5 v1beta2 An AppWrapper generator f
Create a `settings.yaml` file with the settings for the PyTorch job, for
example:
```yaml
namespace: my-namespace # namespace to deploy to (required)
jobName: my-job # name of the generated AppWrapper and PyTorchJob objects (required)
queueName: default-queue # local queue to submit to (default: default-queue)

Expand Down Expand Up @@ -69,5 +68,5 @@ helm template -f settings.yaml mlbatch/pytorchjob-generator | tee generated.yaml
To remove the PyTorch job from the cluster, delete the generated `AppWrapper`
object:
```sh
oc delete appwrapper -n my-namespace my-job
oc delete appwrapper my-job
```
4 changes: 2 additions & 2 deletions tools/pytorchjob-generator/chart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ customize the Jobs generated by the tool.

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| namespace | string | must be provided by user | The Kubernetes namespace in which the Job will run. |
| jobName | string | must be provided by user | Name of the Job. Will be the name of the AppWrapper and the PyTorchJob. |
| namespace | string | `nil` | Namespace in which to run the Job. If unspecified, the namespace will be inferred using normal Helm/Kubernetes mechanisms when the Job is submitted. |
| queueName | string | `"default-queue"` | Name of the local queue to which the Job will be submitted. |
| priority | string | `"default-priority"` | Type of priority for the job (choose from: "default-priority", "low-priority" or "high-priority"). WARNING: "high-priority" jobs need to be approved (We're watching you...)! |
| priority | string | `"default-priority"` | Type of priority for the job (choose from: "default-priority", "low-priority" or "high-priority"). |
| customLabels | array | `nil` | Optional array of custom labels to add to all the resources created by the Job (the PyTorchJob, the PodGroup, and the AppWrapper). |
| containerImage | string | must be provided by the user | Image used for creating the Job's containers (needs to have all the applications your job may need) |
| imagePullSecrets | array | `nil` | List of image-pull-secrets to be used for pulling containerImages |
Expand Down
28 changes: 15 additions & 13 deletions tools/pytorchjob-generator/chart/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,22 @@


{{- define "mlbatch.container.metadata" }}
namespace: {{ .Values.namespace }}
{{- if or .Values.customLabels .Values.autopilotHealthChecks }}
labels:
{{- include "mlbatch.customLabels" . | indent 4 }}
{{- if .Values.autopilotHealthChecks }}
autopilot: ""
{{- range $healthcheck := .Values.autopilotHealthChecks }}
{{ $healthcheck }}: ""
{{- end }}
{{- if or .Values.customLabels .Values.autopilotHealthChecks .Values.multiNicNetworkName }}
metadata:
{{- if or .Values.customLabels .Values.autopilotHealthChecks }}
labels:
{{- include "mlbatch.customLabels" . | indent 8 }}
{{- if .Values.autopilotHealthChecks }}
autopilot: ""
{{- range $healthcheck := .Values.autopilotHealthChecks }}
{{ $healthcheck }}: ""
{{- end }}
{{- end }}
{{- end }}
{{- if .Values.multiNicNetworkName }}
annotations:
k8s.v1.cni.cncf.io/networks: {{ .Values.multiNicNetworkName }}
{{- end }}
{{- end }}
{{- if .Values.multiNicNetworkName }}
annotations:
k8s.v1.cni.cncf.io/networks: {{ .Values.multiNicNetworkName }}
{{- end }}
{{- end -}}

Expand Down
11 changes: 5 additions & 6 deletions tools/pytorchjob-generator/chart/templates/appwrapper.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,9 @@ apiVersion: workload.codeflare.dev/v1beta2
kind: AppWrapper
metadata:
name: {{ .Values.jobName }}
namespace: {{ required "Please specify a 'namespace' in the user file" .Values.namespace }}
{{- if .Values.namespace }}
namespace: {{ .Values.namespace }}
{{- end }}
annotations:
workload.codeflare.dev.mlbatch/pytorchGeneratorVersion: "{{ .Chart.Version }}"
{{- if .Values.admissionGracePeriodDuration }}
Expand Down Expand Up @@ -90,7 +92,6 @@ spec:
kind: "PyTorchJob"
metadata:
name: {{ .Values.jobName }}
namespace: {{ .Values.namespace }}
{{- if .Values.customLabels }}
labels:
{{- include "mlbatch.customLabels" . | indent 26 }}
Expand All @@ -101,8 +102,7 @@ spec:
replicas: 1
restartPolicy: {{ .Values.restartPolicy | default "Never" }}
template:
metadata:
{{- include "mlbatch.container.metadata" . | indent 38 }}
{{- include "mlbatch.container.metadata" . | indent 34 }}
spec:
{{- if .Values.serviceAccountName }}
serviceAccountName: {{ .Values.serviceAccountName }}
Expand All @@ -125,8 +125,7 @@ spec:
replicas: {{ sub .Values.numPods 1 }}
restartPolicy: {{ .Values.restartPolicy | default "Never" }}
template:
metadata:
{{- include "mlbatch.container.metadata" . | indent 38 }}
{{- include "mlbatch.container.metadata" . | indent 34 }}
spec:
{{- if .Values.serviceAccountName }}
serviceAccountName: {{ .Values.serviceAccountName }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,12 @@ Adding Volume Mounts:
kind: PyTorchJob
metadata:
name: my-job
namespace: my-namespace
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -93,8 +90,6 @@ Adding Volume Mounts:
replicas: 3
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -177,15 +172,12 @@ Adding initContainers:
kind: PyTorchJob
metadata:
name: my-job
namespace: my-namespace
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -257,8 +249,6 @@ Adding initContainers:
replicas: 3
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -344,15 +334,12 @@ AppWrapper metadata should match snapshot:
kind: PyTorchJob
metadata:
name: my-job
namespace: my-namespace
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -411,8 +398,6 @@ AppWrapper metadata should match snapshot:
replicas: 3
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -485,15 +470,12 @@ AppWrapper spec should match snapshot:
kind: PyTorchJob
metadata:
name: my-job
namespace: my-namespace
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -552,8 +534,6 @@ AppWrapper spec should match snapshot:
replicas: 3
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -626,15 +606,12 @@ Enabling NVMe:
kind: PyTorchJob
metadata:
name: my-job
namespace: my-namespace
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -708,8 +685,6 @@ Enabling NVMe:
replicas: 3
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -797,7 +772,6 @@ Enabling RoCE GDR:
kind: PyTorchJob
metadata:
name: my-job
namespace: my-namespace
spec:
pytorchReplicaSpecs:
Master:
Expand All @@ -807,7 +781,6 @@ Enabling RoCE GDR:
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -883,7 +856,6 @@ Enabling RoCE GDR:
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -970,7 +942,6 @@ Enabling all advanced features at once:
kind: PyTorchJob
metadata:
name: my-job
namespace: my-namespace
spec:
pytorchReplicaSpecs:
Master:
Expand All @@ -980,7 +951,6 @@ Enabling all advanced features at once:
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -1108,7 +1078,6 @@ Enabling all advanced features at once:
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -1247,15 +1216,12 @@ Enabling sshGitConfig injects the envvars, volumes, and volumeMounts:
kind: PyTorchJob
metadata:
name: my-job
namespace: my-namespace
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down Expand Up @@ -1328,8 +1294,6 @@ Enabling sshGitConfig injects the envvars, volumes, and volumeMounts:
replicas: 3
restartPolicy: Never
template:
metadata:
namespace: my-namespace
spec:
affinity:
nodeAffinity:
Expand Down
8 changes: 8 additions & 0 deletions tools/pytorchjob-generator/chart/tests/helloworld_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,14 @@ tests:
- notExists:
path: metadata.labels

- it: namespace can be set
set:
namespace: testing-ns
asserts:
- equal:
path: metadata.namespace
value: testing-ns

- it: Enabling sshGitConfig injects the envvars, volumes, and volumeMounts
set:
sshGitCloneConfig.secretName: my-git-secret
Expand Down
6 changes: 4 additions & 2 deletions tools/pytorchjob-generator/chart/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,16 @@
"$schema": "https://json-schema.org/draft/2020-12/schema#",
"type": "object",
"required": [
"namespace",
"jobName",
"containerImage"
],
"additionalProperties": false,
"properties": {
"namespace": { "$ref": "#/$defs/rfc1123Label" },
"jobName": { "type": "string" },
"namespace": { "oneOf": [
{ "type": "null" },
{ "$ref": "#/$defs/rfc1123Label" }
]},
"queueName": { "oneOf": [
{ "type": "null" },
{ "$ref": "#/$defs/rfc1123Label" }
Expand Down
11 changes: 5 additions & 6 deletions tools/pytorchjob-generator/chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,20 @@
# Job Metadata
####################

# -- (string) The Kubernetes namespace in which the Job will run.
# @default -- must be provided by user
# @section -- Job Metadata
namespace:

# -- (string) Name of the Job. Will be the name of the AppWrapper and the PyTorchJob.
# @default -- must be provided by user
# @section -- Job Metadata
jobName:

# -- (string) Namespace in which to run the Job. If unspecified, the namespace will be inferred using normal Helm/Kubernetes mechanisms when the Job is submitted.
# @section -- Job Metadata
namespace:

# -- (string) Name of the local queue to which the Job will be submitted.
# @section -- Job Metadata
queueName: "default-queue"

# -- (string) Type of priority for the job (choose from: "default-priority", "low-priority" or "high-priority"). WARNING: "high-priority" jobs need to be approved (We're watching you...)!
# -- (string) Type of priority for the job (choose from: "default-priority", "low-priority" or "high-priority").
# @section -- Job Metadata
priority: "default-priority"

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
namespace: my-namespace # namespace to deploy to (required)
jobName: my-job # name of the generated AppWrapper and PyTorchJob objects (required)
queueName: default-queue # local queue to submit to (default: default-queue)

Expand Down

0 comments on commit 956e682

Please sign in to comment.