Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor update train-deploy project #158

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
1e5ece3
Refactor pipeline functions and update project naming conventions to …
safoinme Nov 27, 2024
3fe656c
Update GitHub Actions workflow to use new ZenML store environment var…
safoinme Nov 27, 2024
1adf5a1
Update train_config.yaml with enhanced model search space and parameters
safoinme Nov 27, 2024
698d145
Remove redundant model parameters from train_config.yaml
safoinme Nov 27, 2024
943cfce
Handle KeyError when retrieving current model version number
safoinme Nov 27, 2024
74baad4
Fix KeyError handling when retrieving current model version number
safoinme Nov 27, 2024
692798e
Enhance error handling for current model version retrieval
safoinme Nov 27, 2024
96c11b1
Update current model version retrieval to use ModelStages for staging
safoinme Nov 27, 2024
4b61d49
Refactor error handling for current version retrieval in performance …
safoinme Nov 27, 2024
5c29e59
Refactor exception handling for current version retrieval in performa…
safoinme Nov 27, 2024
0b09eb6
Refactor exception handling in performance metrics computation to cat…
safoinme Nov 27, 2024
e7fe052
Refactor current model version retrieval to use ModelStages and impro…
safoinme Nov 27, 2024
5fa513b
Add target environment parameter to deployment and inference pipelines
safoinme Nov 28, 2024
cde7595
Add production deployment pipeline and related steps for model deploy…
safoinme Nov 28, 2024
17c0c83
Rename model references from gitguarden to secret_detection across co…
safoinme Nov 28, 2024
39fb332
Update ZenML store configuration for staging workflow
safoinme Nov 28, 2024
bb6b04b
add socat
safoinme Nov 28, 2024
49063ad
Add RUN.md for training and deployment instructions; update train_con…
safoinme Nov 28, 2024
c51a6ab
Add sanitization for model names in Kubernetes deployment
safoinme Nov 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions .github/workflows/run_train_deploy_pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: Staging Trigger Train and Deploy Pipeline
on:
pull_request:
types: [opened, synchronize]
branches: [staging, main]
paths:
- 'train_and_deploy/**'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
run-staging-workflow:
runs-on: ubuntu-latest
env:
ZENML_STORE_URL: ${{ secrets.ZENML_BENTO_PROJECTS_HOST }}
ZENML_STORE_API_KEY: ${{ secrets.ZENML_BENTO_PROJECTS_API_KEY }}
ZENML_STAGING_STACK : 281f82f3-6bdb-4951-bbdd-b85b57b463cc # Set this to your staging stack ID
ZENML_GITHUB_SHA: ${{ github.event.pull_request.head.sha }}
ZENML_GITHUB_URL_PR: ${{ github.event.pull_request._links.html.href }}
ZENML_DEBUG: true
ZENML_ANALYTICS_OPT_IN: false
ZENML_LOGGING_VERBOSITY: INFO
ZENML_DISABLE_CLIENT_SERVER_MISMATCH_WARNING: True

steps:
- name: Check out repository code
uses: actions/checkout@v3

- uses: actions/setup-python@v4
with:
python-version: '3.11'

- name: Install requirements
working-directory: ./train_and_deploy
run: |
sudo apt install socat
pip3 install -r requirements.txt
zenml integration install bentoml skypilot_kubernetes s3 aws evidently --uv -y

- name: Connect to ZenML server
working-directory: ./train_and_deploy
run: |
zenml init

- name: Set stack (Staging)
working-directory: ./train_and_deploy
run: |
zenml stack set ${{ env.ZENML_STAGING_STACK }}

- name: Run pipeline (Staging)
working-directory: ./train_and_deploy
run: |
python run.py --training
2 changes: 1 addition & 1 deletion train_and_deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ classification datasets provided by the scikit-learn library. The project was
generated from the [E2E Batch ZenML project template](https://github.com/zenml-io/template-e2e-batch)
with the following properties:
- Project name: ZenML E2E project
- Technical Name: e2e_use_case
- Technical Name: secret_detection
- Version: `0.0.1`
- Licensed with apache to ZenML GmbH<>
- Deployment environment: `staging`
Expand Down
97 changes: 97 additions & 0 deletions train_and_deploy/RUN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Train and Deploy ML Project

This README provides step-by-step instructions for running the training and deployment pipeline using ZenML.

## Prerequisites

- Git installed
- Python environment set up
- ZenML installed
- Access to the ZenML project repository

## Project Setup

1. Clone the repository and checkout the feature branch:
```bash
git clone [email protected]:zenml-io/zenml-projects.git
git checkout feature/update-train-deploy
```

2. Navigate to the project directory:
```bash
cd train_and_deploy
```

3. Initialize ZenML in the project:
```bash
zenml init
```

## Running the Pipeline

### Training

You have two options for running the training pipeline:

#### Option 1: Automatic via CI
Make any change to the code and push it. This will automatically trigger the CI pipeline that launches training in SkyPilot.

#### Option 2: Manual Execution
1. First, set up your stack. You can choose between:
- Local stack (uses local orchestrator):
```bash
zenml stack set LocalGitGuardian
```
- Remote stack (uses SkyPilot orchestrator):
```bash
zenml stack set RemoteGitGuardian
```

2. Run the training pipeline:
```bash
python run --training
```

### Model Deployment

1. After training completes, deploy the model:
```bash
python run --deployment
```

Note: At this stage, the deployment is done to the model set as "staging" (configured in `target_env`), and the model is deployed locally using BentoML.

2. Test the deployed model:
```bash
python run --inference
```

### Production Deployment

If the staging model performs well and you want to proceed with production deployment:

1. Deploy to Kubernetes:
```bash
python run --production
```
This pipeline will:
- Build a Docker image from the BentoML service
- Deploy it to Kubernetes

## Additional Resources

- [ZenML Projects Tenant Dashboard](https://cloud.zenml.io/organizations/fc992c14-d960-4db7-812e-8f070c99c6f0/tenants/12ec0fd2-ed02-4479-8ff9-ecbfbaae3285)
- [Example GitHub Actions Pipeline](https://github.com/zenml-io/zenml-projects/actions/runs/12075854945/job/33676323427)

## Pipeline Flow Overview

1. Training → Creates and trains the model
2. Deployment → Deploys model to staging environment (local BentoML)
3. Inference → Tests the deployed model
4. Production → Deploys to production Kubernetes environment

## Notes

- The deployment configurations are controlled by the `target_env` setting in the configs
- Make sure you have the necessary permissions and access rights before running the pipelines
- Monitor the CI/CD pipeline in GitHub Actions when using automatic deployment
45 changes: 45 additions & 0 deletions train_and_deploy/configs/deploy_production.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Apache Software License 2.0
#
# Copyright (c) ZenML GmbH 2024. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# environment configuration
settings:
docker:
python_package_installer: uv
required_integrations:
- aws
- sklearn
- bentoml


# configuration of steps
steps:
notify_on_success:
parameters:
notify_on_success: False

# configuration of the Model Control Plane
model:
name: secret_detection
version: staging

# pipeline level extra configurations
extra:
notify_on_failure: True


parameters:
target_env: staging
12 changes: 7 additions & 5 deletions train_and_deploy/configs/deployer_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,13 @@
# environment configuration
settings:
docker:
python_package_installer: uv
required_integrations:
- aws
- evidently
- mlflow
- sklearn
- slack
- bentoml


# configuration of steps
steps:
notify_on_success:
Expand All @@ -34,10 +33,13 @@ steps:

# configuration of the Model Control Plane
model:
name: e2e_use_case
version: production
name: secret_detection
version: staging

# pipeline level extra configurations
extra:
notify_on_failure: True


parameters:
target_env: staging
8 changes: 4 additions & 4 deletions train_and_deploy/configs/inference_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,8 @@
# environment configuration
settings:
docker:
python_package_installer: uv
required_integrations:
- gcp
- evidently
- mlflow
- sklearn
- slack
- bentoml
Expand All @@ -34,10 +32,12 @@ steps:

# configuration of the Model Control Plane
model:
name: e2e_use_case
name: secret_detection
version: staging

# pipeline level extra configurations
extra:
notify_on_failure: True

parameters:
target_env: staging
54 changes: 41 additions & 13 deletions train_and_deploy/configs/train_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,31 +18,32 @@
# environment configuration
settings:
docker:
python_package_installer: uv
required_integrations:
- gcp
- evidently
- mlflow
- sklearn
- slack
- bentoml
orchestrator.vm_kubernetes:
down: True
idle_minutes_to_autostop: 2

# configuration of steps
steps:
model_trainer:
parameters:
name: e2e_use_case
name: secret_detection
promote_with_metric_compare:
parameters:
mlflow_model_name: e2e_use_case
mlflow_model_name: secret_detection
notify_on_success:
parameters:
notify_on_success: False

# configuration of the Model Control Plane
model:
name: e2e_use_case
name: secret_detection
license: apache
description: e2e_use_case E2E Batch Use Case
description: secret_detection E2E Batch Use Case
audience: All ZenML users
use_cases: |
The ZenML E2E project project demonstrates how the most important steps of
Expand All @@ -61,10 +62,10 @@ model:
extra:
notify_on_failure: True
# pipeline level parameters
# Updated train_config.yaml

parameters:
target_env: staging
# This set contains all the model configurations that you want
# to evaluate during hyperparameter tuning stage.
model_search_space:
random_forest:
model_package: sklearn.ensemble
Expand All @@ -80,15 +81,20 @@ parameters:
- 8
- 10
- 12
- None # Allow unlimited depth
min_samples_leaf:
range:
start: 1
end: 10
end: 15
n_estimators:
range:
start: 50
end: 500
step: 25
end: 1000
step: 50
max_features:
- auto
- sqrt
- log2
decision_tree:
model_package: sklearn.tree
model_class: DecisionTreeClassifier
Expand All @@ -103,7 +109,29 @@ parameters:
- 8
- 10
- 12
- None
min_samples_leaf:
range:
start: 1
end: 10
end: 15
gradient_boosting:
model_package: sklearn.ensemble
model_class: GradientBoostingClassifier
search_grid:
learning_rate:
- 0.01
- 0.1
- 0.2
n_estimators:
range:
start: 50
end: 500
step: 50
max_depth:
- 3
- 5
- 7
subsample:
- 0.6
- 0.8
- 1.0
Loading
Loading