Skip to content

Commit

Permalink
Merge pull request #17 from appdevgbb/sg-local-dev
Browse files Browse the repository at this point in the history
spelling and grammar fixes
  • Loading branch information
swgriffith authored May 2, 2024
2 parents 87dc0eb + d1ad5ae commit 10ff072
Showing 1 changed file with 14 additions and 14 deletions.
28 changes: 14 additions & 14 deletions docs/_posts/2024-04-16-aks-kaito.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ authors:

# Project KAITO and the AKS Managed Add-on

The Kubernetes AI Toolchain Operator, also known as Project KAITO, is a open-source solution to simplify the deployment of inference models in a Kubernetes cluster. In particular, the focus is on simplifying the operation of the most popular models available (ex. Falcon, Mistral and Llama2).
The Kubernetes AI Toolchain Operator, also known as Project KAITO, is an open-source solution to simplify the deployment of inference models in a Kubernetes cluster. In particular, the focus is on simplifying the operation of the most popular models available (ex. Falcon, Mistral and Llama2).

KAITO provides operators to manage validation of the requested model against the requested nodepool hardware, deployment of the nodepool and the deployment of the model itself along with a REST endpoint to reach the model.

In this walkthrough we'll deploy an AKS cluster with the KAITO managed add-on. Next we'll deploy and test an infrenece model, which we'll pull from our own private container registry. We'll be following the setup guide from the AKS product docs [here](https://learn.microsoft.com/en-us/azure/aks/ai-toolchain-operator) with some of my own customizations and extensions to simplify tasks.
In this walkthrough we'll deploy an AKS cluster with the KAITO managed add-on. Next, we'll deploy and test an inference model, which we'll pull from our own private container registry. We'll be following the setup guide from the AKS product docs [here](https://learn.microsoft.com/en-us/azure/aks/ai-toolchain-operator) with some of my own customizations and extensions to simplify tasks.

## Cluster Creation

In this setup we'll be creating a very basic AKS cluster via the [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/), just to keep things simple, but this managed add-on will work in any AKS cluster, assuming you meet the [pre-reqs](https://learn.microsoft.com/en-us/azure/aks/ai-toolchain-operator#prerequisites).
In this setup we'll be creating a very basic AKS cluster via the [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/), but this managed add-on will work in any AKS cluster, assuming you meet the [pre-reqs](https://learn.microsoft.com/en-us/azure/aks/ai-toolchain-operator#prerequisites).

We'll also be creating an Azure Container Registry to demonstrate replicating a KAITO model to your own private registry and using it in the model deployment, which would be a security best practice.

Expand All @@ -24,7 +24,7 @@ We'll also be creating an Azure Container Registry to demonstrate replicating a
RG=KaitoLab
LOC=westus3
ACR_NAME=kaitolab
CLUSTER_NAME=kaitocluster
CLUSTER_NAME=kaito

# Create the resource group
az group create -n $RG -l $LOC
Expand All @@ -45,7 +45,7 @@ az aks get-credentials -g $RG -n $CLUSTER_NAME

## Setup the KAITO Identity

KAITO uses the node autoprovisioner to add nodepools to the AKS cluster. To do this it needs rights on the cluster resource group. At this time the rights are pretty broad, but as KAITO reaches general availabiliy we should see those roles refined.
KAITO uses the node auto-provisioner to add nodepools to the AKS cluster. To do this it needs rights on the cluster resource group. At this time the rights are broad, but as KAITO reaches general availabiliy we should see those roles refined.

```bash
# Get the Cluster Resource Group
Expand All @@ -69,7 +69,7 @@ export AKS_OIDC_ISSUER=$(az aks show --resource-group "${RG}" --name "${CLUSTER_
# Create the federation between the KAITO service account and the KAITO Azure Managed Identity
az identity federated-credential create --name "kaito-federated-identity" --identity-name "${KAITO_IDENTITY_NAME}" -g "${MC_RESOURCE_GROUP}" --issuer "${AKS_OIDC_ISSUER}" --subject system:serviceaccount:"kube-system:kaito-gpu-provisioner" --audience api://AzureADTokenExchange

# If you check the gpu provisioner pod, you'll see its in CrashLoopBackOff
# If you check the kaito-gpu-provisioner pod, you'll see it's in CrashLoopBackOff
# due to the identity not yet having been configured with proper rights.
kubectl get pods -l app=ai-toolchain-operator -n kube-system

Expand All @@ -82,11 +82,11 @@ kubectl get pods -l app=ai-toolchain-operator -n kube-system

## Set up the Azure Container Registry

The KAITO team builds and hosts the most popular inference models for you. These models are available in the Microsoft Container Registry(MCR), and if you run a KAITO workspace for one of those models it will pull that image for you automatically. However, as noted above, security best practice is to only pull images from your own trusted repository. Fortunately, KAITO gives you this option.
The KAITO team builds and hosts the most popular inference models for you. These models are available in the Microsoft Container Registry (MCR) and if you run a KAITO workspace for one of those models it will pull that image for you automatically. However, as noted above, security best practice is to only pull images from your own trusted repository. Fortunately, KAITO gives you this option.

Lets pull the image from the MCR into our Azure Container Registry, and link that registry to our AKS cluster. The image for the model in the MCR follows a standard format, as seen below. We just need the model name and version and we can import it into our private registry. We'll use Mistral-7B.
Let's pull the image from the MCR into our Azure Container Registry, and link that registry to our AKS cluster. The image for the model in the MCR follows a standard format, as seen below. We just need the model name and version and we can import it into our private registry. We'll use Mistral-7B.

>**NOTE:** If you aren't already aware, Large Language Models are....LARGE. This import will take some time. Assume 10-20 minutes for most models.
>**NOTE:** If you aren't already aware, Large Language Models are LARGE. This import will take some time. Assume 10-20 minutes for many models.
```bash
MODELNAME=mistral-7b-instruct
Expand All @@ -98,7 +98,7 @@ az acr import -g $RG --name $ACR_NAME --source mcr.microsoft.com/aks/kaito/kait

While the import is running, we can go ahead and start another terminal window to attach the Azure Container Registry to our AKS cluster.

We don't actually need to attach the ACR, if we prefer to use admin credentials and an image pull secret, but using the attach feature is more secure as it authenticates to ACR with the kubelet managed identity.
We don't need to attach the ACR, if we prefer to use admin credentials and an image pull secret, but using the attach feature is more secure as it authenticates to ACR with the kubelet managed identity.

```bash
# If we're in a new terminal window we'll need to set our environment variables
Expand All @@ -112,7 +112,7 @@ az aks update -g $RG -n $CLUSTER_NAME --attach-acr $ACR_NAME

## Deploy a model!

Now that our cluster and registry are all set, we're ready to deploy our first model. We'll generate our 'Workspace' manifest ourselves, but you can also pull from the [examples](https://github.com/Azure/kaito/blob/main/presets/README.md) in the KAITO repo and update as needed. The model below is actually directly from the examples, however I added the 'presetOptions' section to set the source of the model image.
Now that our cluster and registry are all set, we're ready to deploy our first model. We'll generate our 'Workspace' manifest ourselves, but you can also pull from the [examples](https://github.com/Azure/kaito/blob/main/presets/README.md) in the KAITO repo and update as needed. The model below is actually directly from the examples; however I added the 'presetOptions' section to set the source of the model image.

>**NOTE:** Make sure you validate you have quota on the target subscription for the machine type you select below.
Expand Down Expand Up @@ -151,9 +151,9 @@ watch kubectl get workspace,nodes,svc,pods

## Test your inference endpoint

Now that our model is running, we can send it a request. By default the model is only accessible via a ClusterIP inside the Kubernetes cluster, so you'll need to access the endpoint from a test pod. We'll use a public 'curl' image, but you can use whatever you prefer.
Now that our model is running, we can send it a request. By default, the model is only accessible via a ClusterIP inside the Kubernetes cluster, so you'll need to access the endpoint from a test pod. We'll use a public 'curl' image, but you can use whatever you prefer.

You do have the option to expose the model via a Kubernetes Service of type 'LoadBalancer' via the workspace configuration, but that generally isnt recommended. Typically you'd be calling the model from another service inside the cluster, or placing the endpoint behind an ingress controller.
You do have the option to expose the model via a Kubernetes Service of type 'LoadBalancer' via the workspace configuration, but that generally isn't recommended. Typically, you'd be calling the model from another service inside the cluster, or placing the endpoint behind an ingress controller.

```bash
# Get the model cluster IP
Expand All @@ -169,5 +169,5 @@ curl -X POST http://$CLUSTERIP/chat \

## Conclustion

Congratuations! You should now have a working AKS cluster with the Kubernetes AI Toolchain Operator up and running. As you explore KAITO please feel free to reach out to the KAITO team via the [open source project](https://github.com/Azure/kaito/issues) for any questions or feature requests.
Congratulations! You should now have a working AKS cluster with the Kubernetes AI Toolchain Operator up and running. As you explore KAITO please feel free to reach out to the KAITO team via the [open-source project](https://github.com/Azure/kaito/issues) for any questions or feature requests.

0 comments on commit 10ff072

Please sign in to comment.