Skip to content

Commit

Permalink
Package model weights and lora adapters as images to deploy in GKE (#855
Browse files Browse the repository at this point in the history
)

* Tutorial on packaging huggingface models as images

* README

---------

Co-authored-by: Kunjan Patel <[email protected]>
  • Loading branch information
coolkp and coolkp authored Oct 23, 2024
1 parent a3401f2 commit 87d11d5
Show file tree
Hide file tree
Showing 3 changed files with 175 additions and 0 deletions.
12 changes: 12 additions & 0 deletions tutorials-and-examples/models-as-oci/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#syntax=docker/dockerfile:1.7-labs

# Use Alpine as the base image
FROM alpine:latest

# Copy all .safetensors files to the /model directory
COPY model/*.safetensors /model/

# Copy all other files to the /model directory, excluding .safetensors
COPY --exclude='*.safetensors' model/ /model/

# (No CMD or ENTRYPOINT, as per your requirement)
108 changes: 108 additions & 0 deletions tutorials-and-examples/models-as-oci/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Package and Deploy from Hugging Face to Artifact Registry and GKE

This repository contains a Google Cloud Build configuration for building and pushing Docker images of Hugging Face models to Google Artifact Registry.

## Overview

This project allows you to download a Hugging Face model and package it as a Docker image. The Docker image can then be pushed to Google Artifact Registry for deployment or distribution. Build time can be significant for large models, it is recommended to not exceed models above 10 billion parameters. For reference 8b model roughly takes 35 minutes to build and push with this cloudbuild config.

## Prerequisites

- A Google Cloud project with billing enabled.
- Google Cloud SDK installed and authenticated.
- Access to Google Cloud Build and Artifact Registry.
- A Hugging Face account with an access token.

## SetupCreate a Secret for Hugging Face Token

1. **Clone the Repository**

```bash
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name
2. **Create a Secret for Hugging Face Token**
```bash
echo "your_hugging_face_token" | gcloud secrets create huggingface-token --data-file=-

## Configuration

### Substitutions

The following substitutions are defined in the `cloudbuild.yaml` file, they can be changed by passing `--substitutions SUBSTITUTION_NAME=SUBSTITUTION_VALUE` to `gcloud builds submit`:

- **`_MODEL_NAME`**: The name of the Hugging Face model to download (default: `huggingfaceh4/zephyr-7b-beta`).
- **`_REGISTRY`**: The URL for the Docker registry (default: `us-docker.pkg.dev`).
- **`_REPO`**: The name of the Artifact Registry repository (default: `cloud-blog-oci-models`).
- **`_IMAGE_NAME`**: The name of the Docker image to be created (default: `zephyr-7b-beta`).
- **`_CLOUD_SECRET_NAME`**: The name of the secret storing the Hugging Face token (default: `huggingface-token`).

### Options

The following options are configured in the `cloudbuild.yaml` file:

- **`diskSizeGb`**: The size of the disk for the build, specified in gigabytes (default: `100`). can be changed by passing `--disk-size=DISK_SIZE` to `gcloud builds submit`
- **`machineType`**: The machine type can be set by passing `--machine-type=` in `gcloud builds submit`

## Usage

To trigger the Cloud Build and create the Docker image, run the following command:

```bash
gcloud builds submit --config cloudbuild.yaml --substitutions _MODEL_NAME="your_model_name",_IMAGE_NAME="LOCATION-docker.pkg.dev/[YOUR_PROJECT_ID]/[REPOSITORY_NAME]/[IMAGE_NAME]"
```

## Usage

### Inside an Inference Deployment Dockerfile

#### Example

```Dockerfile
# Start from the PyTorch base image with CUDA and cuDNN support
FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

# Set the working directory
WORKDIR /srv

# Install vllm (version 0.3.3)
RUN pip install vllm==0.3.3 --no-cache-dir

# Import the model from the 'model-as-image'
FROM model-as-image as model

# Copy the model files from 'model-as-image' into the inference container
COPY --from=model /model/ /srv/models/$MODEL_DIR/

# Define the entrypoint to run the VLLM OpenAI API server
ENTRYPOINT ["python", "-m", "vllm.entrypoints.openai.api_server", \
"--host", "0.0.0.0", "--port", "80", \
"--model", "/srv/models/$MODEL_DIR", \
"--dtype=half"]
```
### Mount the image as to your inference deployment
You can mount the image to a shared volume in your inference deployment via a sidecar

### example

```yaml
initContainers:
- name: model
image: model-as-image
restartPolicy: Always
args:
- "sh"
- "-c"
- "ln -s /model /mnt/model && sleep infinity"
volumeMounts:
- mountPath: /mnt/model
name: model-image-mount
readOnly: False
volumes:
- name: dshm
emptyDir:
medium: Memory
- name: llama3-model
emptyDir: {}
```
Mount the same volume to your inference container and consume it there.
**Pulling images can be optimized in Google Kubernetes Engine with [image streaming](https://cloud.google.com/kubernetes-engine/docs/how-to/image-streaming) and [secondary boot disk](https://cloud.google.com/kubernetes-engine/docs/how-to/data-container-image-preloading)**. These method can be used for packaging and mass distributing small/medium size models and low rank adapters of foundational models.
55 changes: 55 additions & 0 deletions tutorials-and-examples/models-as-oci/cloudbuild.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
steps:
- name: 'python:3.10-slim'
entrypoint: 'bash'
secretEnv: # Specify the secret environment variables here
- HUGGINGFACE_TOKEN # This will be populated from the secret manager
args:
- '-c'
- |
# Install the huggingface_hub with hf_transfer
pip install hf_transfer
pip install huggingface_hub[hf_transfer]
mkdir -r /workspace/model
# Enable hf_transfer for faster downloads
export HF_HUB_ENABLE_HF_TRANSFER=1
# Write the token to the expected location
huggingface-cli login --token $$HUGGINGFACE_TOKEN
# Download the model using huggingface_hub
huggingface-cli download $_MODEL_NAME --local-dir /workspace/model
- name: 'gcr.io/cloud-builders/docker'
env:
- "DOCKER_BUILDKIT=1"
args:
- build
- '-t'
- '$_REGISTRY/$PROJECT_ID/$_REPO/$_IMAGE_NAME'
- '/workspace'

- name: 'gcr.io/cloud-builders/docker'
args: ['push', '$_REGISTRY/$PROJECT_ID/$_REPO/$_IMAGE_NAME']

availableSecrets:
secretManager:
- versionName: 'projects/$PROJECT_ID/secrets/$_CLOUD_SECRET_NAME/versions/latest'
env: 'HUGGINGFACE_TOKEN' # Environment variable name

timeout: '3000s'

# Configure disk size for the build
options:
diskSizeGb: '100' # Use the DISK_SIZE substitution
machineType: 'E2_HIGHCPU_32' # Specify the machine type
dynamicSubstitutions: true

substitutions:
_MODEL_NAME: 'huggingfaceh4/zephyr-7b-beta' # Default value for model name
_REGISTRY: 'us-docker.pkg.dev'
_REPO: 'cloud-blog-oci-models'
_IMAGE_NAME: 'zephyr-7b-beta' # Default value for image name
_CLOUD_SECRET_NAME: 'huggingface-token' # Default value for cloud secret name

0 comments on commit 87d11d5

Please sign in to comment.