Update MLflow Helm Chart (following bitnami): (#37)

* Update MLflow Helm Chart (following bitnami): - MLflow version 2.16 - `postgresql.enabled=false` by default (recommended to use CSC PUKKI) - `minio.enabled=false` by default (recommended to use CSC ALLAS) - Edit README - EDIT NOTES: `oc` instead of `kubectl` - `compatibility.openshift.adaptSecurityContext=auto`. It won't apply the different `SecurityContext` * Update charts/mlflow/README.md Co-authored-by: Alvaro Gonzalez <[email protected]> * Update README and NOTES.txt following suggestions --------- Co-authored-by: Alvaro Gonzalez <[email protected]>
CSCfi · Sep 27, 2024 · ab196a9 · ab196a9
1 parent d0b3840
commit ab196a9
Show file tree

Hide file tree

Showing 29 changed files with 738 additions and 382 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,3 @@
 .DS_Store
-Chart.lock
+Chart.lock
+*.tgz
diff --git a/charts/mlflow/Chart.yaml b/charts/mlflow/Chart.yaml
@@ -1,34 +1,33 @@
-# Copyright VMware, Inc.
+# Copyright Broadcom, Inc. All Rights Reserved.
 # SPDX-License-Identifier: APACHE-2.0
 
 annotations:
   category: MachineLearning
   licenses: Apache-2.0
   images: |
     - name: git
-      image: docker.io/bitnami/git:2.43.0-debian-11-r1
+      image: docker.io/bitnami/git:2.46.1-debian-12-r1
     - name: mlflow
-      image: docker.io/bitnami/mlflow:2.9.2-debian-11-r0
+      image: docker.io/bitnami/mlflow:2.16.2-debian-12-r3
     - name: os-shell
-      image: docker.io/bitnami/os-shell:11-debian-11-r92
+      image: docker.io/bitnami/os-shell:12-debian-12-r30
 apiVersion: v2
-appVersion: 2.9.2
+appVersion: 2.16.2
 dependencies:
 - condition: minio.enabled
   name: minio
   repository: oci://registry-1.docker.io/bitnamicharts
-  version: 12.x.x
+  version: 14.x.x
 - condition: postgresql.enabled
   name: postgresql
   repository: oci://registry-1.docker.io/bitnamicharts
-  version: 13.2.28
+  version: 15.x.x
 - name: common
   repository: oci://registry-1.docker.io/bitnamicharts
   tags:
   - bitnami-common
   version: 2.x.x
 description: MLflow is an open-source platform designed to manage the end-to-end machine learning lifecycle. It allows you to track experiments, package code into reproducible runs, and share and deploy models.
-  Link to the repo https://github.com/CSCfi/helm-charts
 home: https://bitnami.com
 icon: https://bitnami.com/assets/stacks/mlflow/img/mlflow-stack-220x234.png
 keywords:
@@ -38,10 +37,11 @@ keywords:
 - machine
 - learning
 maintainers:
-- name: VMware, Inc.
+- name: Broadcom, Inc. All Rights Reserved.
   url: https://github.com/bitnami/charts
 name: mlflow
 sources:
+- https://github.com/bitnami/charts/tree/main/bitnami/mlflow
 - https://github.com/bitnami/containers/tree/main/bitnami/mlflow
 - https://github.com/mlflow/mlflow
-version: 0.4.0
+version: 1.5.7
diff --git a/charts/mlflow/README.md b/charts/mlflow/README.md
@@ -5,10 +5,15 @@
 ## Introduction
 This Helm chart deploys MLflow on Rahti2.
 
+It is highly recommended to use the Helm CLI instead of the WebUI of Rahti2. If so, you can clone the GitHub repository from [here](https://github.com/CSCfi/helm-charts).  
+Helm CLI allows you:
+- to download the necessary dependencies in order to run the chart, if you decide to run PostgreSQL and MinIO in Rahti2.
+- to set the necessary values (see command below), if you decide to run a PostgreSQL instance externally and to use an external S3 service.
+
 ## Test and Deploy
 Different steps are necessary to deploy this Helm Chart to Rahti2:  
 
-1. If you want to use CSC external S3 service (Allas), be sure to create Allas credentials.  
+1. By default, this Helm Chart will use the CSC S3 service Allas. Be sure to create Allas credentials.  
    You can achieve this by [sourcing](https://docs.csc.fi/cloud/pouta/install-client/#configure-your-terminal-environment-for-openstack) your cPouta project and then type this command:  
 
      ```sh
@@ -19,36 +24,73 @@ Different steps are necessary to deploy this Helm Chart to Rahti2:
 
    You can also use another external S3 service instead of Allas.
 
-2. Deploy MLflow:
+2. By default, it also uses our CSC database service named [Pukki](https://pukki.dbaas.csc.fi). Be sure to have a database created on this service.  
+During the process of creation of database, it will ask you the `Allowed CIDRs`. Rahti2 has a common egress IP which is `86.50.229.150`. If you want a dedicated egress IP, you can send a ticket to [[email protected]](mailto:[email protected]). More information [here](https://docs.csc.fi/cloud/rahti2/networking/#egress-ips).
+
+   A database named `mlflow_auth` must be created when launching your instance. This database is needed for the auth module (only if `tracking.auth.enabled=true` which is the case by default).
+
+3. Deploy MLflow:
 
      ```sh
-     helm install mlflow . --set externalS3.accessKeyID={ACCESS_KEY} --set externalS3.accessKeySecret={SECRET_KEY} --set externalS3.bucket=mlflow
+     helm install mlflow . --set externalS3.accessKeyID={ACCESS_KEY} \
+     --set externalS3.accessKeySecret={SECRET_KEY} \
+     --set externalS3.bucket={BUCKET_NAME} \
+     --set externalDatabase.host={DB_PUBLIC_IP} \
+     --set externalDatabase.user={DB_USER} \
+     --set externalDatabase.password={DB_PASSWORD} \
+     --set externalDatabase.database={DB_NAME}
      ```
 
    _Replace {ACCESS_KEY} by the access key previously created_  
    _Replace {SECRET_KEY} by the secret key previously created_  
    _Replace {BUCKET_NAME} by the name of the bucket previously created_  
+   _Replace {DB_PUBLIC_IP} by the public IP of your databse created on Pukki_  
+   _Replace {DB_USER} by the user created on Pukki_  
+   _Replace {DB_NAME} by the database created on Pukki_ 
 
    Alternatively, you can edit the `values.yaml`:
 
      ```yaml
+     [...]
+     externalDatabase:
+       host: ''
+       user: ''
+       database: ''
+       password: ''
+     [...]
      externalS3:
        accessKeyID: ''
        accessKeySecret: ''
-       bucket: 'mlflow'
+       bucket: ''
      ```
 
-To access MLflow tracking webpage, run this command to retrieve `user` password:  
+After the deployment, the Web URL will be displayed in the NOTES. To access MLflow tracking webpage, run this command to retrieve `user` password:  
 ```sh
-echo Password: $(oc get secret --namespace {YOUR_NAMESPACE} mlflow-tracking -o jsonpath="{.data.admin-password }" | base64 -d)
+echo Password: $(oc get secret --namespace {YOUR_NAMESPACE} mlflow-tracking -o jsonpath="{ .data.admin-password }" | base64 -d)
 ```
 _Replace {YOUR_NAMESPACE} by the name of your project in Rahti_   
 
-You can edit the `config.yaml`. Instead of deleting your deployment and recreating a new one, Helm lets you `upgrade` your release. Use this command:  
+You can edit the `values.yaml`. Instead of deleting your deployment and recreating a new one, Helm lets you `upgrade` your release. Use this command:  
 ```sh
 helm upgrade mlflow . --set externalS3.accessKeyID={ACCESS_KEY} --set externalS3.accessKeySecret={SECRET_KEY} --set externalS3.bucket={BUCKET_NAME}
 ```
 
+## NOTES
+You can use this template by deploying PostgreSQL and MINIO in Rahti2. You can enable these parameters by editing the `values.yaml`:
+```yaml
+[...]
+postgresql:
+  enabled: true
+[...]
+minio:
+  enabled: true
+```
+
+**It is highly recommended to use our other services (Pukki and Allas) in a production environment.**
+
+If, for some reasons, the Rahti2 node crashes while you have PostgreSQL and MinIO running, it can cause disruptions and corruption in your database.  
+Pukki also has automatic backups for your databases.
+
 ## Project status
 
 ## Links

diff --git a/charts/mlflow/templates/NOTES.txt b/charts/mlflow/templates/NOTES.txt
@@ -12,11 +12,11 @@ The chart has been deployed in diagnostic mode. All probes have been disabled an
 
 Get the list of pods by executing:
 
-  kubectl get pods --namespace {{ include "common.names.namespace" . | quote }} -l app.kubernetes.io/instance={{ .Release.Name }}
+  oc get pods --namespace {{ include "common.names.namespace" . | quote }} -l app.kubernetes.io/instance={{ .Release.Name }}
 
 Access the pod you want to debug by executing
 
-  kubectl exec --namespace {{ include "common.names.namespace" . | quote }} -ti <NAME OF THE POD> -- bash
+  oc exec --namespace {{ include "common.names.namespace" . | quote }} -ti <NAME OF THE POD> -- bash
 
 {{- else }}
 
@@ -27,19 +27,19 @@ The following command will be executed:
   {{- include "common.tplvalues.render" (dict "value" .Values.run.source.launchCommand "context" $) | nindent 2 }}
 
 You can see the logs of each running node with:
-    kubectl logs [POD_NAME]
+    oc logs [POD_NAME]
 
 and the list of pods:
-    kubectl get pods --namespace {{ include "common.names.namespace" . }} -l "app.kubernetes.io/name={{ include "common.names.name" . }},app.kubernetes.io/instance={{ .Release.Name }}"
+    oc get pods --namespace {{ include "common.names.namespace" . }} -l "app.kubernetes.io/name={{ include "common.names.name" . }},app.kubernetes.io/instance={{ .Release.Name }}"
 {{- else }}
 You didn't specify any entrypoint to your code.
 To run it, you can either deploy again using the `source.launchCommand` option to specify your entrypoint, or execute it manually by jumping into the pods:
 
 1. Get the running pods
-    kubectl get pods --namespace {{ include "common.names.namespace" . }} -l "app.kubernetes.io/name={{ include "common.names.name" . }},app.kubernetes.io/instance={{ .Release.Name }}"
+    oc get pods --namespace {{ include "common.names.namespace" . }} -l "app.kubernetes.io/name={{ include "common.names.name" . }},app.kubernetes.io/instance={{ .Release.Name }}"
 
 2. Get into a pod
-    kubectl exec -ti [POD_NAME] bash
+    oc exec -ti [POD_NAME] bash
 
 3. Execute your script as you would normally do.
 {{- end }}
@@ -68,21 +68,21 @@ To access your MLflow site from outside the cluster follow the steps below:
 
 {{- if contains "NodePort" .Values.tracking.service.type }}
 
-   export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "mlflow.v0.tracking.fullname" . }})
-   export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
+   export NODE_PORT=$(oc get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "mlflow.v0.tracking.fullname" . }})
+   export NODE_IP=$(oc get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
    echo "MLflow URL: {{ include "mlflow.v0.tracking.protocol" . }}://$NODE_IP:$NODE_PORT/"
 
 {{- else if contains "LoadBalancer" .Values.tracking.service.type }}
 
   NOTE: It may take a few minutes for the LoadBalancer IP to be available.
-        Watch the status with: 'kubectl get svc --namespace {{ .Release.Namespace }} -w {{ include "mlflow.v0.tracking.fullname" . }}'
+        Watch the status with: 'oc get svc --namespace {{ .Release.Namespace }} -w {{ include "mlflow.v0.tracking.fullname" . }}'
 
-   export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "mlflow.v0.tracking.fullname" . }} --template "{{ "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}" }}")
+   export SERVICE_IP=$(oc get svc --namespace {{ .Release.Namespace }} {{ include "mlflow.v0.tracking.fullname" . }} --template "{{ "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}" }}")
    echo "MLflow URL: {{ include "mlflow.v0.tracking.protocol" . }}://$SERVICE_IP{{- if ne $port "80" }}:{{ include "mlflow.v0.tracking.port" . }}{{ end }}/"
 
 {{- else if contains "ClusterIP"  .Values.tracking.service.type }}
 
-   kubectl port-forward --namespace {{ .Release.Namespace }} svc/{{ include "mlflow.v0.tracking.fullname" . }} {{ include "mlflow.v0.tracking.port" . }}:{{ include "mlflow.v0.tracking.port" . }} &
+   oc port-forward --namespace {{ .Release.Namespace }} svc/{{ include "mlflow.v0.tracking.fullname" . }} {{ include "mlflow.v0.tracking.port" . }}:{{ include "mlflow.v0.tracking.port" . }} &
    echo "MLflow URL: {{ include "mlflow.v0.tracking.protocol" . }}://127.0.0.1{{- if ne $port "80" }}:{{ include "mlflow.v0.tracking.port" . }}{{ end }}//"
 
 
@@ -100,8 +100,8 @@ To access your MLflow site from outside the cluster follow the steps below:
 {{- if .Values.tracking.enabled }}
 3. Login with the following credentials below to see your blog:
 
-  echo Username: $(kubectl get secret --namespace {{ .Release.Namespace }} {{ include "mlflow.v0.tracking.fullname" . }} -o jsonpath="{ .data.{{ include "mlflow.v0.tracking.userKey" . }} }" | base64 -d)
-  echo Password: $(kubectl get secret --namespace {{ .Release.Namespace }} {{ include "mlflow.v0.tracking.fullname" . }} -o jsonpath="{.data.{{ include "mlflow.v0.tracking.passwordKey" . }} }" | base64 -d)
+  echo Username: $(oc get secret --namespace {{ .Release.Namespace }} {{ include "mlflow.v0.tracking.fullname" . }} -o jsonpath="{ .data.{{ include "mlflow.v0.tracking.userKey" . }} }" | base64 -d)
+  echo Password: $(oc get secret --namespace {{ .Release.Namespace }} {{ include "mlflow.v0.tracking.fullname" . }} -o jsonpath="{ .data.{{ include "mlflow.v0.tracking.passwordKey" . }} }" | base64 -d)
 {{- end }}
 {{- end }}
 {{- end }}