Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Altered demo notebooks for Kueue default & mTLS default #486

Merged
merged 12 commits into from
Apr 25, 2024
14 changes: 9 additions & 5 deletions demo-notebooks/additional-demos/hf_interactive.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,12 @@
"id": "bc27f84c",
"metadata": {},
"source": [
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper).\n",
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding Ray Cluster).\n",
"\n",
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
"The example here is a community image."
"The example here is a community image.\n",
"\n",
"NOTE: By default the SDK uses Kueue as it's scheduling solution to use MCAD set the `mcad=True` option in `ClusterConfiguration`"
]
},
{
Expand All @@ -89,7 +91,8 @@
}
],
"source": [
"# Create our cluster and submit appwrapper\n",
"# Create our cluster and submit\n",
"# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
Bobbins228 marked this conversation as resolved.
Show resolved Hide resolved
"cluster = Cluster(ClusterConfiguration(name='hfgputest', \n",
" namespace=\"default\",\n",
Bobbins228 marked this conversation as resolved.
Show resolved Hide resolved
" num_workers=1,\n",
Expand All @@ -99,15 +102,16 @@
" max_memory=16, \n",
" num_gpus=4,\n",
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
" instascale=True, machine_types=[\"m5.xlarge\", \"p3.8xlarge\"]))"
" # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
" ))"
]
},
{
"cell_type": "markdown",
"id": "12eef53c",
"metadata": {},
"source": [
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our Ray Cluster onto the queue, and begin the process of obtaining our resource cluster."
]
},
{
Expand Down
13 changes: 5 additions & 8 deletions demo-notebooks/additional-demos/local_interactive.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,10 @@
"outputs": [],
"source": [
"# Create our cluster and submit appwrapper\n",
"namespace = \"default\"\n",
"namespace = \"default\" # Update to your namespace\n",
Bobbins228 marked this conversation as resolved.
Show resolved Hide resolved
"cluster_name = \"hfgputest-1\"\n",
"local_interactive = True\n",
"\n",
"cluster = Cluster(ClusterConfiguration(local_interactive=local_interactive,\n",
" namespace=namespace,\n",
Bobbins228 marked this conversation as resolved.
Show resolved Hide resolved
"cluster = Cluster(ClusterConfiguration(namespace=namespace,\n",
" name=cluster_name,\n",
" num_workers=1,\n",
" min_cpus=1,\n",
Expand Down Expand Up @@ -117,9 +115,8 @@
"source": [
"from codeflare_sdk import generate_cert\n",
"\n",
"if local_interactive:\n",
" generate_cert.generate_tls_cert(cluster_name, namespace)\n",
" generate_cert.export_env(cluster_name, namespace)"
"generate_cert.generate_tls_cert(cluster_name, namespace)\n",
"generate_cert.export_env(cluster_name, namespace)"
]
},
{
Expand Down Expand Up @@ -339,7 +336,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.17"
"version": "3.9.18"
},
"vscode": {
"interpreter": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this third demo we will go over the basics of the Ray Job Submission Client in the SDK"
"In this demo we will go over the basics of the RayJobClient in the SDK"
]
},
{
Expand Down Expand Up @@ -45,7 +45,7 @@
"# Create and configure our cluster object\n",
"cluster = Cluster(ClusterConfiguration(\n",
" name='jobtest',\n",
" namespace='default',\n",
" namespace='default', # Update to your namespace\n",
" num_workers=2,\n",
" min_cpus=1,\n",
" max_cpus=1,\n",
Expand Down
15 changes: 9 additions & 6 deletions demo-notebooks/guided-demos/0_basic_ray.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,12 @@
"id": "bc27f84c",
"metadata": {},
"source": [
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper).\n",
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding RayCluster).\n",
"\n",
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
"The example here is a community image."
"The example here is a community image.\n",
"\n",
"NOTE: By default the SDK uses Kueue as it's scheduling solution to use MCAD set the `mcad=True` option in `ClusterConfiguration`"
Bobbins228 marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
Expand All @@ -58,18 +60,19 @@
"metadata": {},
"outputs": [],
"source": [
"# Create and configure our cluster object (and appwrapper)\n",
"# Create and configure our cluster object\n",
"# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
Bobbins228 marked this conversation as resolved.
Show resolved Hide resolved
"cluster = Cluster(ClusterConfiguration(\n",
" name='raytest',\n",
" namespace='default',\n",
" namespace='default', # Update to your namespace\n",
" num_workers=2,\n",
" min_cpus=1,\n",
" max_cpus=1,\n",
" min_memory=4,\n",
" max_memory=4,\n",
" num_gpus=0,\n",
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
" instascale=False\n",
" # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
"))"
]
},
Expand All @@ -78,7 +81,7 @@
"id": "12eef53c",
"metadata": {},
"source": [
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our Ray Cluster onto the queue, and begin the process of obtaining our resource cluster."
]
},
{
Expand Down
11 changes: 8 additions & 3 deletions demo-notebooks/guided-demos/1_basic_instascale.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@
"id": "9865ee8c",
"metadata": {},
"source": [
"In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments)."
"In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments).\n",
"\n",
"NOTE: The InstaScale and MCAD components are in Tech Preview"
]
},
{
Expand Down Expand Up @@ -45,7 +47,9 @@
"This time, we are working in a cloud environment, and our OpenShift cluster does not have the resources needed for our desired workloads. We will use InstaScale to dynamically scale-up guaranteed resources based on our request (that will also automatically scale-down when we are finished working):\n",
"\n",
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
"The example here is a community image."
"The example here is a community image.\n",
"\n",
"NOTE: This specific demo requires MCAD and InstaScale to be enabled on the Cluster"
]
},
{
Expand All @@ -58,14 +62,15 @@
"# Create and configure our cluster object (and appwrapper)\n",
"cluster = Cluster(ClusterConfiguration(\n",
" name='instascaletest',\n",
" namespace='default',\n",
" namespace='default', # Update to your namespace\n",
" num_workers=2,\n",
" min_cpus=2,\n",
" max_cpus=2,\n",
" min_memory=8,\n",
" max_memory=8,\n",
" num_gpus=1,\n",
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
" mcad=True, # Enable MCAD\n",
" instascale=True, # InstaScale now enabled, will scale OCP cluster to guarantee resource request\n",
" machine_types=[\"m5.xlarge\", \"g4dn.xlarge\"] # Head, worker AWS machine types desired\n",
"))"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"outputs": [],
"source": [
"# Import pieces from codeflare-sdk\n",
"from codeflare_sdk import Cluster, ClusterConfiguration, TokenAuthentication, RayJobClient"
"from codeflare_sdk import Cluster, ClusterConfiguration, TokenAuthentication"
]
},
{
Expand All @@ -27,9 +27,8 @@
"# IF unused, SDK will automatically check for default kubeconfig, then in-cluster config\n",
"# KubeConfigFileAuthentication can also be used to specify kubeconfig path manually\n",
"\n",
"auth_token = \"XXXXX\" # The auth_token is used later for the RayJobClient\n",
"auth = TokenAuthentication(\n",
" token = auth_token,\n",
" token = \"XXXXX\",\n",
" server = \"XXXXX\",\n",
" skip_tls=False\n",
")\n",
Expand All @@ -45,7 +44,7 @@
"# Create and configure our cluster object\n",
"cluster = Cluster(ClusterConfiguration(\n",
" name='jobtest',\n",
" namespace='default',\n",
" namespace='default', # Update to your namespace\n",
" num_workers=2,\n",
" min_cpus=1,\n",
" max_cpus=1,\n",
Expand Down Expand Up @@ -80,14 +79,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ray Job Submission - Authorized Ray Cluster"
"### Ray Job Submission"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Submit a job using an authorized Ray dashboard and the Job Submission Client\n",
"* Initialise the Cluster Job Client \n",
"* Provide an entrypoint command directed to your job script\n",
"* Set up your runtime environment"
]
Expand All @@ -98,16 +97,11 @@
"metadata": {},
"outputs": [],
"source": [
"# Gather the dashboard URL\n",
"ray_dashboard = cluster.cluster_dashboard_uri()\n",
"\n",
"# Create the header for passing your bearer token\n",
"header = {\n",
" 'Authorization': f'Bearer {auth_token}'\n",
"}\n",
"\n",
"# Initialize the RayJobClient\n",
"client = RayJobClient(address=ray_dashboard, headers=header, verify=True)"
"# Initialize the Job Submission Client\n",
"\"\"\"\n",
"The SDK will automatically gather the dashboard address and authenticate using the Ray Job Submission Client\n",
"\"\"\"\n",
"client = cluster.job_client"
]
},
{
Expand All @@ -116,7 +110,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Submit an example mnist job using the RayJobClient\n",
"# Submit an example mnist job using the Job Submission Client\n",
"submission_id = client.submit_job(\n",
" entrypoint=\"python mnist.py\",\n",
" runtime_env={\"working_dir\": \"./\",\"pip\": \"requirements.txt\"},\n",
Expand Down Expand Up @@ -186,60 +180,6 @@
"client.delete_job(submission_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Unauthorized Ray Cluster with the Ray Job Client"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"Initialise the RayJobClient with the Ray Dashboard\n",
"\"\"\"\n",
"ray_dashboard = cluster.cluster_dashboard_uri()\n",
"client = RayJobClient(address=ray_dashboard, verify=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Submit an example mnist job using the RayJobClient\n",
"submission_id = client.submit_job(\n",
" entrypoint=\"python mnist.py\",\n",
" runtime_env={\"working_dir\": \"./\",\"pip\": \"requirements.txt\"},\n",
")\n",
"print(submission_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Stop the job \n",
"client.stop_job(submission_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete the job\n",
"client.delete_job(submission_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
32 changes: 24 additions & 8 deletions demo-notebooks/guided-demos/3_basic_interactive.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "bbc21043",
"metadata": {},
"source": [
"In this fourth and final notebook, we will go over how to leverage the SDK to directly work interactively with a Ray cluster during development."
"In this fourth notebook, we will go over how to leverage the SDK to directly work interactively with a Ray Cluster during development."
Bobbins228 marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
Expand Down Expand Up @@ -45,7 +45,9 @@
"Once again, let's start by running through the same cluster setup as before:\n",
"\n",
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
"The example here is a community image."
"The example here is a community image.\n",
"\n",
"NOTE: By default the SDK uses Kueue as it's scheduling solution to use MCAD set the `mcad=True` option in `ClusterConfiguration`"
]
},
{
Expand All @@ -55,20 +57,21 @@
"metadata": {},
"outputs": [],
"source": [
"# Create and configure our cluster object (and appwrapper)\n",
"# Create and configure our cluster object\n",
"# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
Bobbins228 marked this conversation as resolved.
Show resolved Hide resolved
"namespace = \"default\" # Update to your namespace\n",
"cluster_name = \"interactivetest\"\n",
"cluster = Cluster(ClusterConfiguration(\n",
" name='interactivetest',\n",
" namespace='default',\n",
" name=cluster_name,\n",
" namespace=namespace,\n",
" num_workers=2,\n",
" min_cpus=2,\n",
" max_cpus=2,\n",
" min_memory=8,\n",
" max_memory=8,\n",
" num_gpus=1,\n",
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
" instascale=True, #<---instascale enabled\n",
" machine_types=[\"m5.xlarge\", \"g4dn.xlarge\"]\n",
" \n",
" # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
"))"
]
},
Expand Down Expand Up @@ -125,6 +128,19 @@
"Now we can connect directly to our Ray cluster via the Ray python client:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9436436",
"metadata": {},
"outputs": [],
"source": [
"from codeflare_sdk import generate_cert\n",
"# Export the environment variables to enable TLS\n",
Bobbins228 marked this conversation as resolved.
Show resolved Hide resolved
"generate_cert.generate_tls_cert(cluster_name, namespace)\n",
"generate_cert.export_env(cluster_name, namespace)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
Loading
Loading