Package model weights and lora adapters as images to deploy in GKE (#855

) * Tutorial on packaging huggingface models as images * README --------- Co-authored-by: Kunjan Patel <[email protected]>
GoogleCloudPlatform · Oct 23, 2024 · 87d11d5 · 87d11d5
1 parent a3401f2
commit 87d11d5
Show file tree

Hide file tree

Showing 3 changed files with 175 additions and 0 deletions.
diff --git a/tutorials-and-examples/models-as-oci/Dockerfile b/tutorials-and-examples/models-as-oci/Dockerfile
@@ -0,0 +1,12 @@
+#syntax=docker/dockerfile:1.7-labs
+
+# Use Alpine as the base image
+FROM alpine:latest
+
+# Copy all .safetensors files to the /model directory
+COPY model/*.safetensors /model/
+
+# Copy all other files to the /model directory, excluding .safetensors
+COPY --exclude='*.safetensors' model/ /model/ 
+
+# (No CMD or ENTRYPOINT, as per your requirement)
diff --git a/tutorials-and-examples/models-as-oci/README b/tutorials-and-examples/models-as-oci/README
@@ -0,0 +1,108 @@
+# Package and Deploy from Hugging Face to Artifact Registry and GKE
+
+This repository contains a Google Cloud Build configuration for building and pushing Docker images of Hugging Face models to Google Artifact Registry.
+
+## Overview
+
+This project allows you to download a Hugging Face model and package it as a Docker image. The Docker image can then be pushed to Google Artifact Registry for deployment or distribution. Build time can be significant for large models, it is recommended to not exceed models above 10 billion parameters. For reference 8b model roughly takes 35 minutes to build and push with this cloudbuild config.
+
+## Prerequisites
+
+- A Google Cloud project with billing enabled.
+- Google Cloud SDK installed and authenticated.
+- Access to Google Cloud Build and Artifact Registry.
+- A Hugging Face account with an access token.
+
+## SetupCreate a Secret for Hugging Face Token
+
+1. **Clone the Repository**
+
+   ```bash
+   git clone https://github.com/your-username/your-repo-name.git
+   cd your-repo-name
+2. **Create a Secret for Hugging Face Token**
+   ```bash 
+   echo "your_hugging_face_token" | gcloud secrets create huggingface-token --data-file=-
+
+## Configuration
+
+### Substitutions
+
+The following substitutions are defined in the `cloudbuild.yaml` file, they can be changed by passing `--substitutions SUBSTITUTION_NAME=SUBSTITUTION_VALUE` to `gcloud builds submit`:
+
+- **`_MODEL_NAME`**: The name of the Hugging Face model to download (default: `huggingfaceh4/zephyr-7b-beta`).
+- **`_REGISTRY`**: The URL for the Docker registry (default: `us-docker.pkg.dev`).
+- **`_REPO`**: The name of the Artifact Registry repository (default: `cloud-blog-oci-models`).
+- **`_IMAGE_NAME`**: The name of the Docker image to be created (default: `zephyr-7b-beta`).
+- **`_CLOUD_SECRET_NAME`**: The name of the secret storing the Hugging Face token (default: `huggingface-token`).
+
+### Options
+
+The following options are configured in the `cloudbuild.yaml` file:
+
+- **`diskSizeGb`**: The size of the disk for the build, specified in gigabytes (default: `100`). can be changed by passing `--disk-size=DISK_SIZE` to `gcloud builds submit`
+- **`machineType`**: The machine type can be set by passing `--machine-type=` in `gcloud builds submit`
+
+## Usage
+
+To trigger the Cloud Build and create the Docker image, run the following command:
+
+```bash
+gcloud builds submit --config cloudbuild.yaml --substitutions _MODEL_NAME="your_model_name",_IMAGE_NAME="LOCATION-docker.pkg.dev/[YOUR_PROJECT_ID]/[REPOSITORY_NAME]/[IMAGE_NAME]"
+```
+
+## Usage
+
+### Inside an Inference Deployment Dockerfile
+
+#### Example
+
+```Dockerfile
+# Start from the PyTorch base image with CUDA and cuDNN support
+FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel
+
+# Set the working directory
+WORKDIR /srv
+
+# Install vllm (version 0.3.3)
+RUN pip install vllm==0.3.3 --no-cache-dir
+
+# Import the model from the 'model-as-image'
+FROM model-as-image as model
+
+# Copy the model files from 'model-as-image' into the inference container
+COPY --from=model /model/ /srv/models/$MODEL_DIR/
+
+# Define the entrypoint to run the VLLM OpenAI API server
+ENTRYPOINT ["python", "-m", "vllm.entrypoints.openai.api_server", \
+            "--host", "0.0.0.0", "--port", "80", \
+            "--model", "/srv/models/$MODEL_DIR", \
+            "--dtype=half"]
+```
+### Mount the image as to your inference deployment
+You can mount the image to a shared volume in your inference deployment via a sidecar 
+
+### example
+
+```yaml
+initContainers:
+  - name: model
+    image: model-as-image
+    restartPolicy: Always
+    args:
+    - "sh"
+    - "-c"
+    - "ln -s /model /mnt/model && sleep infinity"
+    volumeMounts:
+    - mountPath: /mnt/model
+      name: model-image-mount
+      readOnly: False
+volumes:
+  - name: dshm
+    emptyDir:
+      medium: Memory
+  - name: llama3-model
+    emptyDir: {}
+```
+Mount the same volume to your inference container and consume it there. 
+**Pulling images can be optimized in Google Kubernetes Engine with [image streaming](https://cloud.google.com/kubernetes-engine/docs/how-to/image-streaming) and [secondary boot disk](https://cloud.google.com/kubernetes-engine/docs/how-to/data-container-image-preloading)**. These method can be used for packaging and mass distributing small/medium size models and low rank adapters of foundational models. 
diff --git a/tutorials-and-examples/models-as-oci/cloudbuild.yaml b/tutorials-and-examples/models-as-oci/cloudbuild.yaml
@@ -0,0 +1,55 @@
+steps:
+  - name: 'python:3.10-slim'
+    entrypoint: 'bash'
+    secretEnv:  # Specify the secret environment variables here
+      - HUGGINGFACE_TOKEN  # This will be populated from the secret manager
+    args:
+      - '-c'
+      - |
+        # Install the huggingface_hub with hf_transfer
+        pip install hf_transfer
+        pip install huggingface_hub[hf_transfer]
+
+        mkdir -r /workspace/model
+        
+        # Enable hf_transfer for faster downloads
+        export HF_HUB_ENABLE_HF_TRANSFER=1
+        
+        # Write the token to the expected location
+        huggingface-cli login --token $$HUGGINGFACE_TOKEN 
+        
+        # Download the model using huggingface_hub
+        huggingface-cli download $_MODEL_NAME --local-dir /workspace/model
+
+  - name: 'gcr.io/cloud-builders/docker'
+    env:
+      - "DOCKER_BUILDKIT=1"
+    args: 
+      - build
+      - '-t'
+      - '$_REGISTRY/$PROJECT_ID/$_REPO/$_IMAGE_NAME'
+      - '/workspace'
+
+  - name: 'gcr.io/cloud-builders/docker'
+    args: ['push', '$_REGISTRY/$PROJECT_ID/$_REPO/$_IMAGE_NAME']
+
+availableSecrets:
+  secretManager:
+    - versionName: 'projects/$PROJECT_ID/secrets/$_CLOUD_SECRET_NAME/versions/latest'
+      env: 'HUGGINGFACE_TOKEN'  # Environment variable name
+
+timeout: '3000s'
+
+# Configure disk size for the build
+options:
+  diskSizeGb: '100'  # Use the DISK_SIZE substitution
+  machineType: 'E2_HIGHCPU_32'  # Specify the machine type
+  dynamicSubstitutions: true
+
+substitutions:
+  _MODEL_NAME: 'huggingfaceh4/zephyr-7b-beta'  # Default value for model name
+  _REGISTRY: 'us-docker.pkg.dev'
+  _REPO: 'cloud-blog-oci-models'
+  _IMAGE_NAME: 'zephyr-7b-beta'  # Default value for image name
+  _CLOUD_SECRET_NAME: 'huggingface-token'  # Default value for cloud secret name
+