diff --git a/SETUP.md b/SETUP.md index dc67e53..549d67f 100644 --- a/SETUP.md +++ b/SETUP.md @@ -10,8 +10,8 @@ defines a `mlbatch-edit` role which enforces these restrictions and will be used in the setup process for each team of MLBatch users that is onboarded. -This setup has been developed on OpenShift 4.14 and Kubernetes 1.27 and -is intended to support OpenShift 4.14 and up and/or Kubernetes 1.27 and up. +This setup has been developed on Red Hat OpenShift 4.14 and Kubernetes 1.27 and +is intended to support Red Hat OpenShift 4.14 and up and/or Kubernetes 1.27 and up. To start with, recursively clone and enter this repository: ```sh @@ -24,41 +24,46 @@ one for each base platform. ## OpenShift AI +We recommend using the most recent ***stable*** release of +Red Hat OpenShift AI as the base platform for MLBatch. Please see +[Red Hat OpenShift AI Self-Managed Life Cycle](https://access.redhat.com/support/policy/updates/rhoai-sm/lifecycle) +for the life cycle dates of currently supported ***stable*** and ***fast*** releases. + Instructions are provided for the following OpenShift AI ***stable*** releases: -+ OpenShift AI 2.10 - + [RHOAI 2.10 Cluster Setup](./setup.RHOAI-v2.10/CLUSTER-SETUP.md) - + [RHOAI 2.10 Team Setup](./setup.RHOAI-v2.10/TEAM-SETUP.md) - + [RHOAI 2.10 Uninstall](./setup.RHOAI-v2.10/UNINSTALL.md) -+ OpenShift AI 2.13 ++ Red Hat OpenShift AI 2.13 + [RHOAI 2.13 Cluster Setup](./setup.RHOAI-v2.13/CLUSTER-SETUP.md) + [RHOAI 2.13 Team Setup](./setup.RHOAI-v2.13/TEAM-SETUP.md) + [UPGRADING from RHOAI 2.10](./setup.RHOAI-v2.13/UPGRADE-STABLE.md) + [UPGRADING from RHOAI 2.12](./setup.RHOAI-v2.13/UPGRADE-FAST.md) + [RHOAI 2.13 Uninstall](./setup.RHOAI-v2.13/UNINSTALL.md) ++ Red Hat OpenShift AI 2.10 + + [RHOAI 2.10 Cluster Setup](./setup.RHOAI-v2.10/CLUSTER-SETUP.md) + + [RHOAI 2.10 Team Setup](./setup.RHOAI-v2.10/TEAM-SETUP.md) + + [RHOAI 2.10 Uninstall](./setup.RHOAI-v2.10/UNINSTALL.md) Instructions are provided for the following OpenShift AI ***fast*** releases: -+ OpenShift AI 2.11 - + [RHOAI 2.11 Cluster Setup](./setup.RHOAI-v2.11/CLUSTER-SETUP.md) - + [RHOAI 2.11 Team Setup](./setup.RHOAI-v2.11/TEAM-SETUP.md) - + [UPGRADING from RHOAI 2.10](./setup.RHOAI-v2.11/UPGRADE.md) - + [RHOAI 2.11 Uninstall](./setup.RHOAI-v2.11/UNINSTALL.md) -+ OpenShift AI 2.12 ++ Red Hat OpenShift AI 2.12 + [RHOAI 2.12 Cluster Setup](./setup.RHOAI-v2.12/CLUSTER-SETUP.md) + [RHOAI 2.12 Team Setup](./setup.RHOAI-v2.12/TEAM-SETUP.md) + [UPGRADING from RHOAI 2.11](./setup.RHOAI-v2.12/UPGRADE.md) + [RHOAI 2.12 Uninstall](./setup.RHOAI-v2.12/UNINSTALL.md) ++ Red Hat OpenShift AI 2.11 + + [RHOAI 2.11 Cluster Setup](./setup.RHOAI-v2.11/CLUSTER-SETUP.md) + + [RHOAI 2.11 Team Setup](./setup.RHOAI-v2.11/TEAM-SETUP.md) + + [UPGRADING from RHOAI 2.10](./setup.RHOAI-v2.11/UPGRADE.md) + + [RHOAI 2.11 Uninstall](./setup.RHOAI-v2.11/UNINSTALL.md) ## Kubernetes -MLBatch can be installed on any Kubernetes cluster version 1.27 or later -by following these instructions: - + [Kubernetes Cluster Setup](./setup.k8s-v1.27/CLUSTER-SETUP.md) - + [Kubternets Team Setup](./setup.k8s-v1.27/TEAM-SETUP.md) - + [Kubernetes Uninstall](setup.k8s-v1.27/UNINSTALL.md) - On Kubernetes version 1.30 and later, an enhanced user experience is available by using ValidatingAdmissionPolicies to streamline quota enforcement. Follow these instructions when installing on 1.30+ clusters: + [Kubernetes 1.30+ Cluster Setup](./setup.k8s-v1.30/CLUSTER-SETUP.md) + [Kubernetes 1.30+ Team Setup](./setup.k8s-v1.30/TEAM-SETUP.md) + [Kubernetes 1.30+ Uninstall](setup.k8s-v1.30/UNINSTALL.md) + +MLBatch can be installed on any Kubernetes cluster version 1.27 or later +by following these instructions: + + [Kubernetes Cluster Setup](./setup.k8s-v1.27/CLUSTER-SETUP.md) + + [Kubternets Team Setup](./setup.k8s-v1.27/TEAM-SETUP.md) + + [Kubernetes Uninstall](setup.k8s-v1.27/UNINSTALL.md) diff --git a/setup.RHOAI-v2.10/CLUSTER-SETUP.md b/setup.RHOAI-v2.10/CLUSTER-SETUP.md index 81aeed7..97c9168 100644 --- a/setup.RHOAI-v2.10/CLUSTER-SETUP.md +++ b/setup.RHOAI-v2.10/CLUSTER-SETUP.md @@ -1,10 +1,10 @@ # Cluster Setup -The cluster setup installs OpenShift AI and Coscheduler, configures Kueue, +The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue, cluster roles, and priority classes. If MLBatch is deployed on a cluster that used to run earlier versions of ODH, -[MCAD](https://github.com/project-codeflare/mcad), OpenShift AI, or Coscheduler, +[MCAD](https://github.com/project-codeflare/mcad), Red Hat OpenShift AI, or Coscheduler, make sure to scrub traces of these installations. In particular, make sure to delete the following custom resource definitions (CRD) if present on the cluster. Make sure to delete all instances prior to deleting the CRDs: @@ -39,9 +39,9 @@ oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2 oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.10/coscheduler-priority-patch.yaml scheduler-plugins-scheduler ``` -## OpenShift AI +## Red Hat OpenShift AI -Create the OpenShift AI subscription: +Create the Red Hat OpenShift AI subscription: ```sh oc apply -f setup.RHOAI-v2.10/mlbatch-subscription.yaml ```` @@ -66,11 +66,11 @@ Create Data Science Cluster: ```sh oc apply -f setup.RHOAI-v2.10/mlbatch-dsc.yaml ``` -The provided DSCI and DSC are intended to install a minimal set of OpenShift +The provided DSCI and DSC are intended to install a minimal set of Red Hat OpenShift AI managed components: `codeflare`, `kueue`, `ray`, and `trainingoperator`. The remaining components such as `dashboard` can be optionally enabled. -The configuration of the managed components differs from the default OpenShift +The configuration of the managed components differs from the default Red Hat OpenShift AI configuration as follows: - Kubeflow Training Operator: - `gang-scheduler-name` is set to `scheduler-plugins-scheduler`, @@ -88,7 +88,7 @@ AI configuration as follows: - pod priorities, resource requests and limits have been adjusted. To work around https://issues.redhat.com/browse/RHOAIENG-7887 (a race condition -in OpenShift AI installation), do a rolling restart of the Kueue manager. +in Red Hat OpenShift AI installation), do a rolling restart of the Kueue manager. ```sh oc rollout restart deployment/kueue-controller-manager -n redhat-ods-applications ``` diff --git a/setup.RHOAI-v2.11/CLUSTER-SETUP.md b/setup.RHOAI-v2.11/CLUSTER-SETUP.md index 435dced..7a04b50 100644 --- a/setup.RHOAI-v2.11/CLUSTER-SETUP.md +++ b/setup.RHOAI-v2.11/CLUSTER-SETUP.md @@ -1,10 +1,10 @@ # Cluster Setup -The cluster setup installs OpenShift AI and Coscheduler, configures Kueue, +The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue, cluster roles, and priority classes. If MLBatch is deployed on a cluster that used to run earlier versions of ODH, -[MCAD](https://github.com/project-codeflare/mcad), OpenShift AI, or Coscheduler, +[MCAD](https://github.com/project-codeflare/mcad), Red Hat OpenShift AI, or Coscheduler, make sure to scrub traces of these installations. In particular, make sure to delete the following custom resource definitions (CRD) if present on the cluster. Make sure to delete all instances prior to deleting the CRDs: @@ -39,9 +39,9 @@ oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2 oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.11/coscheduler-priority-patch.yaml scheduler-plugins-scheduler ``` -## OpenShift AI +## Red Hat OpenShift AI -Create the OpenShift AI subscription: +Create the Red Hat OpenShift AI subscription: ```sh oc apply -f setup.RHOAI-v2.11/mlbatch-subscription.yaml ```` @@ -66,11 +66,11 @@ Create Data Science Cluster: ```sh oc apply -f setup.RHOAI-v2.11/mlbatch-dsc.yaml ``` -The provided DSCI and DSC are intended to install a minimal set of OpenShift +The provided DSCI and DSC are intended to install a minimal set of Red Hat OpenShift AI managed components: `codeflare`, `kueue`, `ray`, and `trainingoperator`. The remaining components such as `dashboard` can be optionally enabled. -The configuration of the managed components differs from the default OpenShift +The configuration of the managed components differs from the default Red Hat OpenShift AI configuration as follows: - Kubeflow Training Operator: - `gang-scheduler-name` is set to `scheduler-plugins-scheduler`, @@ -88,7 +88,7 @@ AI configuration as follows: - pod priorities, resource requests and limits have been adjusted. To work around https://issues.redhat.com/browse/RHOAIENG-7887 (a race condition -in OpenShift AI installation), do a rolling restart of the Kueue manager. +in Red Hat OpenShift AI installation), do a rolling restart of the Kueue manager. ```sh oc rollout restart deployment/kueue-controller-manager -n redhat-ods-applications ``` diff --git a/setup.RHOAI-v2.12/CLUSTER-SETUP.md b/setup.RHOAI-v2.12/CLUSTER-SETUP.md index d483c39..ecbc051 100644 --- a/setup.RHOAI-v2.12/CLUSTER-SETUP.md +++ b/setup.RHOAI-v2.12/CLUSTER-SETUP.md @@ -1,10 +1,10 @@ # Cluster Setup -The cluster setup installs OpenShift AI and Coscheduler, configures Kueue, +The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue, cluster roles, and priority classes. If MLBatch is deployed on a cluster that used to run earlier versions of ODH, -[MCAD](https://github.com/project-codeflare/mcad), OpenShift AI, or Coscheduler, +[MCAD](https://github.com/project-codeflare/mcad), Red Hat OpenShift AI, or Coscheduler, make sure to scrub traces of these installations. In particular, make sure to delete the following custom resource definitions (CRD) if present on the cluster. Make sure to delete all instances prior to deleting the CRDs: @@ -39,9 +39,9 @@ oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2 oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.12/coscheduler-priority-patch.yaml scheduler-plugins-scheduler ``` -## OpenShift AI +## Red Hat OpenShift AI -Create the OpenShift AI subscription: +Create the Red Hat OpenShift AI subscription: ```sh oc apply -f setup.RHOAI-v2.12/mlbatch-subscription.yaml ```` @@ -66,11 +66,11 @@ Create Data Science Cluster: ```sh oc apply -f setup.RHOAI-v2.12/mlbatch-dsc.yaml ``` -The provided DSCI and DSC are intended to install a minimal set of OpenShift +The provided DSCI and DSC are intended to install a minimal set of Red Hat OpenShift AI managed components: `codeflare`, `kueue`, `ray`, and `trainingoperator`. The remaining components such as `dashboard` can be optionally enabled. -The configuration of the managed components differs from the default OpenShift +The configuration of the managed components differs from the default Red Hat OpenShift AI configuration as follows: - Kubeflow Training Operator: - `gang-scheduler-name` is set to `scheduler-plugins-scheduler`, @@ -88,7 +88,7 @@ AI configuration as follows: - pod priorities, resource requests and limits have been adjusted. To work around https://issues.redhat.com/browse/RHOAIENG-7887 (a race condition -in OpenShift AI installation), do a rolling restart of the Kueue manager. +in Red Hat OpenShift AI installation), do a rolling restart of the Kueue manager. ```sh oc rollout restart deployment/kueue-controller-manager -n redhat-ods-applications ``` diff --git a/setup.RHOAI-v2.13/CLUSTER-SETUP.md b/setup.RHOAI-v2.13/CLUSTER-SETUP.md index 33f73c2..1a2bb94 100644 --- a/setup.RHOAI-v2.13/CLUSTER-SETUP.md +++ b/setup.RHOAI-v2.13/CLUSTER-SETUP.md @@ -1,10 +1,10 @@ # Cluster Setup -The cluster setup installs OpenShift AI and Coscheduler, configures Kueue, +The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue, cluster roles, and priority classes. If MLBatch is deployed on a cluster that used to run earlier versions of ODH, -[MCAD](https://github.com/project-codeflare/mcad), OpenShift AI, or Coscheduler, +[MCAD](https://github.com/project-codeflare/mcad), Red Hat OpenShift AI, or Coscheduler, make sure to scrub traces of these installations. In particular, make sure to delete the following custom resource definitions (CRD) if present on the cluster. Make sure to delete all instances prior to deleting the CRDs: @@ -39,9 +39,9 @@ oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2 oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/coscheduler-priority-patch.yaml scheduler-plugins-scheduler ``` -## OpenShift AI +## Red Hat OpenShift AI -Create the OpenShift AI subscription: +Create the Red Hat OpenShift AI subscription: ```sh oc apply -f setup.RHOAI-v2.13/mlbatch-subscription.yaml ```` @@ -66,11 +66,11 @@ Create Data Science Cluster: ```sh oc apply -f setup.RHOAI-v2.13/mlbatch-dsc.yaml ``` -The provided DSCI and DSC are intended to install a minimal set of OpenShift +The provided DSCI and DSC are intended to install a minimal set of Red Hat OpenShift AI managed components: `codeflare`, `kueue`, `ray`, and `trainingoperator`. The remaining components such as `dashboard` can be optionally enabled. -The configuration of the managed components differs from the default OpenShift +The configuration of the managed components differs from the default Red Hat OpenShift AI configuration as follows: - Kubeflow Training Operator: - `gang-scheduler-name` is set to `scheduler-plugins-scheduler`, @@ -88,7 +88,7 @@ AI configuration as follows: - pod priorities, resource requests and limits have been adjusted. To work around https://issues.redhat.com/browse/RHOAIENG-7887 (a race condition -in OpenShift AI installation), do a rolling restart of the Kueue manager. +in Red Hat OpenShift AI installation), do a rolling restart of the Kueue manager. ```sh oc rollout restart deployment/kueue-controller-manager -n redhat-ods-applications ``` diff --git a/setup.tmpl/CLUSTER-SETUP.md.tmpl b/setup.tmpl/CLUSTER-SETUP.md.tmpl index dc8a42c..0817b07 100644 --- a/setup.tmpl/CLUSTER-SETUP.md.tmpl +++ b/setup.tmpl/CLUSTER-SETUP.md.tmpl @@ -1,11 +1,11 @@ # Cluster Setup {{ if .OPENSHIFT -}} -The cluster setup installs OpenShift AI and Coscheduler, configures Kueue, +The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue, cluster roles, and priority classes. If MLBatch is deployed on a cluster that used to run earlier versions of ODH, -[MCAD](https://github.com/project-codeflare/mcad), OpenShift AI, or Coscheduler, +[MCAD](https://github.com/project-codeflare/mcad), Red Hat OpenShift AI, or Coscheduler, make sure to scrub traces of these installations. In particular, make sure to delete the following custom resource definitions (CRD) if present on the cluster. Make sure to delete all instances prior to deleting the CRDs: @@ -65,9 +65,9 @@ Patch Coscheduler pod priorities: ``` {{ if .OPENSHIFT -}} -## OpenShift AI +## Red Hat OpenShift AI -Create the OpenShift AI subscription: +Create the Red Hat OpenShift AI subscription: ```sh {{ .KUBECTL }} apply -f setup.{{ .VERSION }}/mlbatch-subscription.yaml ```` @@ -92,11 +92,11 @@ Create Data Science Cluster: ```sh {{ .KUBECTL }} apply -f setup.{{ .VERSION }}/mlbatch-dsc.yaml ``` -The provided DSCI and DSC are intended to install a minimal set of OpenShift +The provided DSCI and DSC are intended to install a minimal set of Red Hat OpenShift AI managed components: `codeflare`, `kueue`, `ray`, and `trainingoperator`. The remaining components such as `dashboard` can be optionally enabled. -The configuration of the managed components differs from the default OpenShift +The configuration of the managed components differs from the default Red Hat OpenShift AI configuration as follows: - Kubeflow Training Operator: - `gang-scheduler-name` is set to `scheduler-plugins-scheduler`, @@ -114,7 +114,7 @@ AI configuration as follows: - pod priorities, resource requests and limits have been adjusted. To work around https://issues.redhat.com/browse/RHOAIENG-7887 (a race condition -in OpenShift AI installation), do a rolling restart of the Kueue manager. +in Red Hat OpenShift AI installation), do a rolling restart of the Kueue manager. ```sh {{ .KUBECTL }} rollout restart deployment/kueue-controller-manager -n redhat-ods-applications ```