Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
dmytro-spodarets committed Jul 16, 2024
1 parent 0684454 commit 76f9e55
Show file tree
Hide file tree
Showing 13 changed files with 529 additions and 1 deletion.
25 changes: 25 additions & 0 deletions .github/workflows/ray-deploy-yolo8.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Deploy YOLOv8

on: workflow_dispatch

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- uses: actions/setup-python@v4
with:
python-version: '3.10.9'

- name: Install Python dependencies
uses: py-actions/py-dependency-install@v4
with:
path: "Ray/requirements.txt"

- name: Deploy YOLOv8 to the Ray cluster
id: deploy-yolov8
run: |
cd Ray/deploy
RAY_ADDRESS='http://18.217.15.194:8265' ray job submit --no-wait --working-dir . -- sh deploy_script.sh
41 changes: 41 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
.idea
.DS_Store

# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version
# control as they are data points which are potentially sensitive and subject
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

__pycache__

.venv
36 changes: 35 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,35 @@
# MLOps-Workshops
# MLOps-Workshops

Налаштування

Використання віртуального середовища Python
```
python -m venv .venv
source .venv/bin/activate
```

Налаштування AWS
```
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /
aws configure
pip install boto3
```


Створення локального Kubernetes кластеру
```
https://www.docker.com/products/docker-desktop/
brew install helm
brew install kubectl or brew install kubectl
brew install minikube
minikube start --driver=docker
```

https://k8slens.dev/

Видалення локального Kubernetes кластеру
```
minikube delete --all
```
89 changes: 89 additions & 0 deletions Ray/Lecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Майстерня з MLOps: використання Ray

## Розгортання та налаштування Ray Cluster

Встановлення Ray
```
pip install -U "ray[all]"
```
### AWS
Конфігурування кластера відбуваеться через файл `cluster-config.yaml`, який далі використовується для його запуска:
```
ray up -y cluster-config-aws.yaml
```
Приклад [cluster-config-aws.yaml](cluster-config-aws.yaml) для розгортяння в AWS

https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-configuration.html

### Kubernetes
Розгорніть оператор KubeRay із Helm chart репозиторія
```
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.1.1
kubectl get pods
```

Розгорніть ресурси RayCluster
```
helm install raycluster kuberay/ray-cluster --version 1.1.1 --set 'image.tag=2.9.0-aarch64'
kubectl get rayclusters
kubectl get pods --selector=ray.io/cluster=raycluster-kuberay
```

Кастомізація кластеру - https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html

Отримати інформацію про головну ноду
```
kubectl get service raycluster-kuberay-head-svc
```

Порт форвардінг для доступу до Ray Dashboard
```
kubectl port-forward service/raycluster-kuberay-head-svc 8265:8265
```

Вхід до Ray Dashboard
```
http://localhost:8265
```

Запуск тестової задачі
```
ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
```

Видалення RayCluster
```
helm uninstall raycluster
kubectl get pods
helm uninstall kuberay-operator
kubectl get pods
```

## Навчання моделі
```
ray job submit --address http://localhost:8265 --working-dir . -- python run_training.py
```

## Оптимізація гіперпараметрів
```
ray job submit --address http://localhost:8265 --working-dir . -- python tune_hyperparameters.py
```

## Сервінг моделей
https://docs.ray.io/en/latest/serve/configure-serve-deployment.html
```
serve run object_detection:entrypoint
RAY_ADDRESS='http://localhost:8265' ray job submit --no-wait --working-dir . -- sh deploy_script.sh
```

## Різні команди роботи з джобами:
```
ray job logs JOB_ID
ray job status JOB_ID
ray job stop JOB_ID
ray job delete JOB_ID
ray job list
```
https://docs.ray.io/en/latest/cluster/running-applications/job-submission/cli.html
42 changes: 42 additions & 0 deletions Ray/cluster-config-aws.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
cluster_name: AWS-Ray-Cluster # Назва кластеру

max_workers: 5 # Максимальна кількість worker вузлів у кластері

provider:
type: aws # Тип провайдера - AWS
region: us-east-2 # Регіон AWS, де буде розгорнуто кластер

available_node_types: # Визначення типів вузлів, доступних для кластеру
ray.head.default: # Конфігурація для head вузла
resources: {"CPU": 2} # Встановлюємо ресурси CPU на 0 для головного вузла
node_config: # Конфігурація EC2 інстансу для head вузла
InstanceType: m5.large # Тип інстансу AWS EC2 для head вузла
ImageId: ami-0aa5328ffcf5d34ac # ID образу AMI для head вузла

ray.worker.default: # Конфігурація для worker вузлів
min_workers: 0 # Мінімальна кількість worker вузлів
max_workers: 5 # Максимальна кількість worker вузлів
resources: {} # Ресурси, які будуть доступні на worker вузлах (може бути залишено порожнім)
node_config: # Конфігурація EC2 інстансу для worker вузлів
InstanceType: m5.2xlarge # Тип інстансу AWS EC2 для worker вузлів
ImageId: ami-0aa5328ffcf5d34ac # ID образу AMI для worker вузлів

head_node_type: ray.head.default # Визначення типу вузла для head вузла

setup_commands: # Команди для встановлення необхідного програмного забезпечення
- pip install -U "ray[all]" # Встановлення Ray з усіма додатковими компонентами
- pip install torch torchvision # Встановлення PyTorch
- pip install fastapi # Встановлення FastAPI
- pip install uvicorn # Встановлення Uvicorn
- sudo apt-get update && sudo apt-get install -y wget tar jq # Встановлення wget, tar, jq
- pip install boto3 # Встановлення Boto3
- pip install ultralytics # Встановлення ultralytics

head_setup_commands: # Додаткові команди для head вузла
- echo "Head setup complete"

worker_setup_commands: # Додаткові команди для worker вузлів
- echo "Worker setup complete"

initialization_commands: # Команди, які виконуються після налаштування вузлів, але до запуску Ray
- echo "Initialization complete"
3 changes: 3 additions & 0 deletions Ray/deploy/deploy_script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

serve run object_detection:entrypoint
53 changes: 53 additions & 0 deletions Ray/deploy/object_detection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import torch
from fastapi.responses import JSONResponse
from fastapi import FastAPI
from ultralytics import YOLO

import ray
from ray import serve
from ray.serve.handle import DeploymentHandle

ray.init()
serve.start(http_options={"host": "0.0.0.0", "port": 8000})

app = FastAPI()

@serve.deployment(num_replicas=1)
@serve.ingress(app)
class APIIngress:
def __init__(self, object_detection_handle) -> None:
self.handle: DeploymentHandle = object_detection_handle.options(
use_new_handle_api=True,
)

@app.get("/detect")
async def detect(self, image_url: str):
result = await self.handle.detect.remote(image_url)
return JSONResponse(content=result)


@serve.deployment(
autoscaling_config={"min_replicas": 1, "max_replicas": 2},
)
class ObjectDetection:
def __init__(self):
self.model = YOLO('yolov8n.pt')

async def detect(self, image_url: str):
results = self.model(image_url)

detected_objects = []
if len(results) > 0:
for result in results:
for box in result.boxes:
class_id = int(box.cls[0])
object_name = result.names[class_id]
coords = box.xyxy[0].tolist()
detected_objects.append({"class": object_name, "coordinates": coords})

if len(detected_objects) > 0:
return {"status": "found", "objects": detected_objects}
else:
return {"status": "not found"}

entrypoint = APIIngress.bind(ObjectDetection.bind())
24 changes: 24 additions & 0 deletions Ray/deploy/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import cv2
import numpy as np
import requests
import json

image_url = "https://previews.123rf.com/images/imagesource/imagesource2205/imagesource220506074/185772883-young-male-mountain-biker-on-rural-road-mount-diablo-bay-area-california-usa.jpg"
server_url = "http://18.217.15.194:8000/detect"

resp = requests.get(image_url)
image_nparray = np.asarray(bytearray(resp.content), dtype=np.uint8)
image = cv2.imdecode(image_nparray, cv2.IMREAD_COLOR)

resp = requests.get(f"{server_url}?image_url={image_url}")
detections = resp.json()["objects"]

for item in detections:
class_name = item["class"]
coords = item["coordinates"]

cv2.rectangle(image, (int(coords[0]), int(coords[1])), (int(coords[2]), int(coords[3])), (0, 0, 0), 2)

cv2.putText(image, class_name, (int(coords[0]), int(coords[1] - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2)

cv2.imwrite("output.jpeg", image)
67 changes: 67 additions & 0 deletions Ray/hyperparameters/train_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
from ray import air, tune
from ray.air import session
import os

class ConvNet(nn.Module):
def __init__(self):
super(ConvNet, self).__init__()
self.conv1 = nn.Conv2d(1, 3, kernel_size=3)
self.fc = nn.Linear(192, 10)

def forward(self, x):
x = torch.relu(torch.max_pool2d(self.conv1(x), 3))
x = x.view(-1, 192)
x = self.fc(x)
return torch.log_softmax(x, dim=1)

def train_model(config):
net = ConvNet()
device = "cuda" if torch.cuda.is_available() else "cpu"
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
dataset = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
train_subset, val_subset = random_split(dataset, [50000, 10000])

trainloader = DataLoader(train_subset, batch_size=int(config["batch_size"]), shuffle=True)
valloader = DataLoader(val_subset, batch_size=int(config["batch_size"]), shuffle=True)

for epoch in range(10):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)

optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()

val_loss = 0.0
total = 0
correct = 0
with torch.no_grad():
for data in valloader:
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
val_loss += criterion(outputs, labels).item()
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

accuracy = correct / total

session.report({"loss": val_loss, "accuracy": accuracy})

print("Тренування завершено")
Loading

0 comments on commit 76f9e55

Please sign in to comment.