Skip to content

Commit

Permalink
minor edits to the post
Browse files Browse the repository at this point in the history
  • Loading branch information
swgriffith committed Apr 16, 2024
1 parent 286acc2 commit 0cd7356
Showing 1 changed file with 4 additions and 5 deletions.
9 changes: 4 additions & 5 deletions docs/_posts/2024-04-16-aks-kaito.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ authors:

The Kubernetes AI Toolchain Operator, also known as Project KAITO, is a open-source solution to simplify the deployment of inference models in a Kubernetes cluster. In particular, the focus is on simplifying the operation of the most popular models available (ex. Falcon, Mistral and Llama2).

KAITO provides operators to manage validation of the requested model against the requested nodepool hardware, deployment of the nodepool and the deployment of the model itself along with a rest endpoint to reach the model.
KAITO provides operators to manage validation of the requested model against the requested nodepool hardware, deployment of the nodepool and the deployment of the model itself along with a REST endpoint to reach the model.

In this walkthrough we'll deploy an AKS cluster with the KAITO managed add-on. Next we'll deploy and test an infrenece model, which we'll pull from our own private Container Registry. We'll be following the setup guide from the AKS product docs [here](https://learn.microsoft.com/en-us/azure/aks/ai-toolchain-operator) with some of my own customizations and extensions to simplify tasks.
In this walkthrough we'll deploy an AKS cluster with the KAITO managed add-on. Next we'll deploy and test an infrenece model, which we'll pull from our own private container registry. We'll be following the setup guide from the AKS product docs [here](https://learn.microsoft.com/en-us/azure/aks/ai-toolchain-operator) with some of my own customizations and extensions to simplify tasks.

## Cluster Creation

Expand All @@ -25,7 +25,6 @@ RG=KaitoLab
LOC=westus3
ACR_NAME=kaitolab
CLUSTER_NAME=kaitocluster
GPU_POOL_SKU=Standard_NC16as_T4_v3

# Create the resource group
az group create -n $RG -l $LOC
Expand Down Expand Up @@ -113,7 +112,7 @@ az aks update -g $RG -n $CLUSTER_NAME --attach-acr $ACR_NAME

## Deploy a model!

Now that our cluster and registry are all set, we're ready to deploy our first model. We'll generate our 'Workspace' manifest ourselves, but you can also pull from the [examples](https://github.com/Azure/kaito/blob/main/presets/README.md) in the KAITO repo and update as needed.
Now that our cluster and registry are all set, we're ready to deploy our first model. We'll generate our 'Workspace' manifest ourselves, but you can also pull from the [examples](https://github.com/Azure/kaito/blob/main/presets/README.md) in the KAITO repo and update as needed. The model below is actually directly from the examples, however I added the 'presetOptions' section to set the source of the model image.

>**NOTE:** Make sure you validate you have quota on the target subscription for the machine type you select below.
Expand Down Expand Up @@ -154,7 +153,7 @@ watch kubectl get workspace,nodes,svc,pods

Now that our model is running, we can send it a request. By default the model is only accessible via a ClusterIP inside the Kubernetes cluster, so you'll need to access the endpoint from a test pod. We'll use a public 'curl' image, but you can use whatever you prefer.

You do have the option to expose the model via a Kubernetes Service of type 'LoadBalancer' via the workspace configuration, but that generally isnt recommended.
You do have the option to expose the model via a Kubernetes Service of type 'LoadBalancer' via the workspace configuration, but that generally isnt recommended. Typically you'd be calling the model from another service inside the cluster, or placing the endpoint behind an ingress controller.

```bash
# Get the model cluster IP
Expand Down

0 comments on commit 0cd7356

Please sign in to comment.