From 0aad5b28a0a7f540b3bda6a9f2aaf60a13b05011 Mon Sep 17 00:00:00 2001 From: guofei Date: Tue, 30 Jan 2024 18:12:32 -0800 Subject: [PATCH 1/2] docs: Add instructions about how to add new model in kaito --- README.md | 5 ++-- docs/How-to-add-new-models.md | 28 +++++++++++++++++++++++ docs/proposals/YYYYMMDD-model-template.md | 2 +- 3 files changed, 31 insertions(+), 4 deletions(-) create mode 100644 docs/How-to-add-new-models.md diff --git a/README.md b/README.md index 96f53ee07..dffc18f37 100644 --- a/README.md +++ b/README.md @@ -224,6 +224,8 @@ $ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X P The detailed usage for Kaito supported models can be found in [**HERE**](presets/README.md). In case users want to deploy their own containerized models, they can provide the pod template in the `inference` field of the workspace custom resource (please see [API definitions](api/v1alpha1/workspace_types.go) for details). The controller will create a deployment workload using all provisioned GPU nodes. Note that currently the controller does **NOT** handle automatic model upgrade. It only creates inference workloads based on the preset configurations if the workloads do not exist. +The number of the supported models in Kaito is growing! Please check [this](./docs/How-to-add-new-models.md) document to see how to add a new supported model. + ## Contributing [Read more](docs/contributing/readme.md) @@ -258,6 +260,3 @@ This project has adopted the [Microsoft Open Source Code of Conduct](https://ope ## Contact "Kaito devs" - - - diff --git a/docs/How-to-add-new-models.md b/docs/How-to-add-new-models.md new file mode 100644 index 000000000..f2b13c090 --- /dev/null +++ b/docs/How-to-add-new-models.md @@ -0,0 +1,28 @@ +# New OSS model onboarding + +This document describes how to add a new supported OSS model in Kaito. The process is designed to allow community users to initiate the request. Kaito maintainers will follow up and deal with managing the model images and guiding the code changes to set up the model preset configurations. + +## Step 1: Make a proposal + +This step is done by the requestor. The requestor should make a PR to describe the target OSS model following this [template](./proposals/YYYYMMDD-model-template.md). The proposal status should be `provisional` in the beginning. Kaito maintainers will review the PR and decide to accept or reject the PR. The PR could be rejected if the target OSS model has low usage, or it has strict license limitations, or it is a relatively small model with limited capabilities. + + +## Step 2: Validate and test the model + +This step is done by Kaito maintainers. Based on the information provided in the proposal, Kaito maintainers will download the model and test it using the specified runtime. The entire process is automated via GitHub actions when Kaito maintainers file a PR to add the model to the [supported\_models.yaml](../presets/models/supported_models.yaml). + + +## Step 3: Push model image to MCR + +This step is done by Kaito maintainers. If the model license allows, Kaito maintainers will push the model image to MCR, making the image publicly available. This step is skipped if only private access is allowed for the model image. Once this step is done, Kaito maintainers will update the status of the proposal submitted in Step 1 to `ready to integrate`. + +## Step 4: Add preset configurations + +This step is done by the requestor. The requestor will work on a PR to register the model with preset configurations. The PR will contain code changes to implement a simple inference interface. [Here](../presets/models/falcon/model.go) is an existing example. In the same PR, or a sperate PR, the status of the proposal status should be updated to `integrated`. + +## Step 5: Add an E2E test + +This step is done by the requestor. A new e2e test should be added to [here](../test/e2e/preset_test.go) which ensures the inference service is up and running with preset configurations. + + +After all the above are done, a new model becomes available in Kaito. diff --git a/docs/proposals/YYYYMMDD-model-template.md b/docs/proposals/YYYYMMDD-model-template.md index 709e46ac9..b09b03bf7 100644 --- a/docs/proposals/YYYYMMDD-model-template.md +++ b/docs/proposals/YYYYMMDD-model-template.md @@ -6,7 +6,7 @@ reviewers: - "Kaito contributor" creation-date: yyyy-mm-dd last-updated: yyyy-mm-dd -status: provisional|implemented|deferred|rejected|withdrawn +status: provisional|ready to integrate|integrated --- # Title From 513eabe525a238238879cd57b38c4d3c64aa0884 Mon Sep 17 00:00:00 2001 From: guofei Date: Wed, 31 Jan 2024 19:58:57 -0800 Subject: [PATCH 2/2] fix a typo --- docs/How-to-add-new-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/How-to-add-new-models.md b/docs/How-to-add-new-models.md index f2b13c090..8874fa3c8 100644 --- a/docs/How-to-add-new-models.md +++ b/docs/How-to-add-new-models.md @@ -18,7 +18,7 @@ This step is done by Kaito maintainers. If the model license allows, Kaito maint ## Step 4: Add preset configurations -This step is done by the requestor. The requestor will work on a PR to register the model with preset configurations. The PR will contain code changes to implement a simple inference interface. [Here](../presets/models/falcon/model.go) is an existing example. In the same PR, or a sperate PR, the status of the proposal status should be updated to `integrated`. +This step is done by the requestor. The requestor will work on a PR to register the model with preset configurations. The PR will contain code changes to implement a simple inference interface. [Here](../presets/models/falcon/model.go) is an existing example. In the same PR, or a separate PR, the status of the proposal status should be updated to `integrated`. ## Step 5: Add an E2E test