initial commit

dmytro-spodarets · Jul 16, 2024 · 76f9e55 · 76f9e55
1 parent 0684454
commit 76f9e55
Show file tree

Hide file tree

Showing 13 changed files with 529 additions and 1 deletion.
diff --git a/.github/workflows/ray-deploy-yolo8.yml b/.github/workflows/ray-deploy-yolo8.yml
@@ -0,0 +1,25 @@
+name: Deploy YOLOv8
+
+on: workflow_dispatch
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - uses: actions/setup-python@v4
+        with:
+          python-version: '3.10.9'
+
+      - name: Install Python dependencies
+        uses: py-actions/py-dependency-install@v4
+        with:
+          path: "Ray/requirements.txt"
+
+      - name: Deploy YOLOv8 to the Ray cluster
+        id: deploy-yolov8
+        run: |
+          cd Ray/deploy
+          RAY_ADDRESS='http://18.217.15.194:8265' ray job submit --no-wait --working-dir . -- sh deploy_script.sh
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,41 @@
+.idea
+.DS_Store
+
+# Local .terraform directories
+**/.terraform/*
+
+# .tfstate files
+*.tfstate
+*.tfstate.*
+
+# Crash log files
+crash.log
+crash.*.log
+
+# Exclude all .tfvars files, which are likely to contain sensitive data, such as
+# password, private keys, and other secrets. These should not be part of version
+# control as they are data points which are potentially sensitive and subject
+# to change depending on the environment.
+*.tfvars
+*.tfvars.json
+
+# Ignore override files as they are usually used to override resources locally and so
+# are not checked in
+override.tf
+override.tf.json
+*_override.tf
+*_override.tf.json
+
+# Include override files you do wish to add to version control using negated pattern
+# !example_override.tf
+
+# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
+# example: *tfplan*
+
+# Ignore CLI configuration files
+.terraformrc
+terraform.rc
+
+__pycache__
+
+.venv
diff --git a/README.md b/README.md
@@ -1 +1,35 @@
-# MLOps-Workshops
+# MLOps-Workshops
+
+Налаштування
+
+Використання віртуального середовища Python
+```
+python -m venv .venv
+source .venv/bin/activate
+```
+
+Налаштування AWS
+```
+curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
+sudo installer -pkg AWSCLIV2.pkg -target /
+aws configure
+pip install boto3
+```
+
+
+Створення локального Kubernetes кластеру
+```
+https://www.docker.com/products/docker-desktop/
+brew install helm
+brew install kubectl or brew install kubectl
+brew install minikube
+
+minikube start --driver=docker
+```
+
+https://k8slens.dev/
+
+Видалення локального Kubernetes кластеру
+```
+minikube delete --all
+```
diff --git a/Ray/Lecture.md b/Ray/Lecture.md
@@ -0,0 +1,89 @@
+# Майстерня з MLOps: використання Ray
+
+## Розгортання та налаштування Ray Cluster
+
+Встановлення Ray
+```
+pip install -U "ray[all]"
+```
+### AWS
+Конфігурування кластера відбуваеться через файл `cluster-config.yaml`, який далі використовується для його запуска:
+```
+ray up -y cluster-config-aws.yaml
+```
+Приклад [cluster-config-aws.yaml](cluster-config-aws.yaml) для розгортяння в AWS
+
+https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-configuration.html
+
+### Kubernetes
+Розгорніть оператор KubeRay із Helm chart репозиторія
+```
+helm repo add kuberay https://ray-project.github.io/kuberay-helm/
+helm repo update
+helm install kuberay-operator kuberay/kuberay-operator --version 1.1.1
+kubectl get pods
+```
+
+Розгорніть ресурси RayCluster
+```
+helm install raycluster kuberay/ray-cluster --version 1.1.1 --set 'image.tag=2.9.0-aarch64'
+kubectl get rayclusters
+kubectl get pods --selector=ray.io/cluster=raycluster-kuberay
+```
+
+Кастомізація кластеру - https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html
+
+Отримати інформацію про головну ноду 
+```
+kubectl get service raycluster-kuberay-head-svc
+```
+
+Порт форвардінг для доступу до Ray Dashboard
+```
+kubectl port-forward service/raycluster-kuberay-head-svc 8265:8265
+```
+
+Вхід до Ray Dashboard
+```
+http://localhost:8265
+```
+
+Запуск тестової задачі
+```
+ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
+```
+
+Видалення RayCluster
+```
+helm uninstall raycluster
+kubectl get pods
+helm uninstall kuberay-operator
+kubectl get pods
+```
+
+## Навчання моделі
+```
+ray job submit --address http://localhost:8265 --working-dir . -- python run_training.py
+```
+
+## Оптимізація гіперпараметрів
+```
+ray job submit --address http://localhost:8265 --working-dir . -- python tune_hyperparameters.py
+```
+
+## Сервінг моделей
+https://docs.ray.io/en/latest/serve/configure-serve-deployment.html
+```
+serve run object_detection:entrypoint
+RAY_ADDRESS='http://localhost:8265' ray job submit --no-wait --working-dir . -- sh deploy_script.sh
+```
+
+## Різні команди роботи з джобами:
+```
+ray job logs JOB_ID
+ray job status JOB_ID
+ray job stop JOB_ID
+ray job delete JOB_ID
+ray job list
+```
+https://docs.ray.io/en/latest/cluster/running-applications/job-submission/cli.html
diff --git a/Ray/cluster-config-aws.yaml b/Ray/cluster-config-aws.yaml
@@ -0,0 +1,42 @@
+cluster_name: AWS-Ray-Cluster  # Назва кластеру
+
+max_workers: 5  # Максимальна кількість worker вузлів у кластері
+
+provider:
+    type: aws  # Тип провайдера - AWS
+    region: us-east-2  # Регіон AWS, де буде розгорнуто кластер
+
+available_node_types:  # Визначення типів вузлів, доступних для кластеру
+  ray.head.default:  # Конфігурація для head вузла
+    resources: {"CPU": 2}  # Встановлюємо ресурси CPU на 0 для головного вузла
+    node_config:  # Конфігурація EC2 інстансу для head вузла
+      InstanceType: m5.large  # Тип інстансу AWS EC2 для head вузла
+      ImageId: ami-0aa5328ffcf5d34ac  # ID образу AMI для head вузла
+
+  ray.worker.default:  # Конфігурація для worker вузлів
+    min_workers: 0  # Мінімальна кількість worker вузлів
+    max_workers: 5  # Максимальна кількість worker вузлів
+    resources: {}  # Ресурси, які будуть доступні на worker вузлах (може бути залишено порожнім)
+    node_config:  # Конфігурація EC2 інстансу для worker вузлів
+      InstanceType: m5.2xlarge  # Тип інстансу AWS EC2 для worker вузлів
+      ImageId: ami-0aa5328ffcf5d34ac  # ID образу AMI для worker вузлів
+
+head_node_type: ray.head.default  # Визначення типу вузла для head вузла
+
+setup_commands:  # Команди для встановлення необхідного програмного забезпечення
+  - pip install -U "ray[all]"  # Встановлення Ray з усіма додатковими компонентами
+  - pip install torch torchvision  # Встановлення PyTorch
+  - pip install fastapi  # Встановлення FastAPI
+  - pip install uvicorn  # Встановлення Uvicorn
+  - sudo apt-get update && sudo apt-get install -y wget tar jq  # Встановлення wget, tar, jq
+  - pip install boto3  # Встановлення Boto3
+  - pip install ultralytics # Встановлення ultralytics
+
+head_setup_commands:  # Додаткові команди для head вузла
+  - echo "Head setup complete"
+
+worker_setup_commands:  # Додаткові команди для worker вузлів
+  - echo "Worker setup complete"
+
+initialization_commands:  # Команди, які виконуються після налаштування вузлів, але до запуску Ray
+  - echo "Initialization complete"
diff --git a/Ray/deploy/deploy_script.sh b/Ray/deploy/deploy_script.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+serve run object_detection:entrypoint
diff --git a/Ray/deploy/object_detection.py b/Ray/deploy/object_detection.py
@@ -0,0 +1,53 @@
+import torch
+from fastapi.responses import JSONResponse
+from fastapi import FastAPI
+from ultralytics import YOLO
+
+import ray
+from ray import serve
+from ray.serve.handle import DeploymentHandle
+
+ray.init()
+serve.start(http_options={"host": "0.0.0.0", "port": 8000})
+
+app = FastAPI()
+
+@serve.deployment(num_replicas=1)
+@serve.ingress(app)
+class APIIngress:
+    def __init__(self, object_detection_handle) -> None:
+        self.handle: DeploymentHandle = object_detection_handle.options(
+            use_new_handle_api=True,
+        )
+
+    @app.get("/detect")
+    async def detect(self, image_url: str):
+        result = await self.handle.detect.remote(image_url)
+        return JSONResponse(content=result)
+
+
+@serve.deployment(
+    autoscaling_config={"min_replicas": 1, "max_replicas": 2},
+)
+class ObjectDetection:
+    def __init__(self):
+        self.model = YOLO('yolov8n.pt')
+
+    async def detect(self, image_url: str):
+        results = self.model(image_url)
+
+        detected_objects = []
+        if len(results) > 0:
+            for result in results:
+                for box in result.boxes:
+                    class_id = int(box.cls[0])
+                    object_name = result.names[class_id]
+                    coords = box.xyxy[0].tolist()
+                    detected_objects.append({"class": object_name, "coordinates": coords})
+
+        if len(detected_objects) > 0:
+            return {"status": "found", "objects": detected_objects}
+        else:
+            return {"status": "not found"}
+
+entrypoint = APIIngress.bind(ObjectDetection.bind())
diff --git a/Ray/deploy/test.py b/Ray/deploy/test.py
@@ -0,0 +1,24 @@
+import cv2
+import numpy as np
+import requests
+import json
+
+image_url = "https://previews.123rf.com/images/imagesource/imagesource2205/imagesource220506074/185772883-young-male-mountain-biker-on-rural-road-mount-diablo-bay-area-california-usa.jpg"
+server_url = "http://18.217.15.194:8000/detect"
+
+resp = requests.get(image_url)
+image_nparray = np.asarray(bytearray(resp.content), dtype=np.uint8)
+image = cv2.imdecode(image_nparray, cv2.IMREAD_COLOR)
+
+resp = requests.get(f"{server_url}?image_url={image_url}")
+detections = resp.json()["objects"]
+
+for item in detections:
+    class_name = item["class"]
+    coords = item["coordinates"]
+
+    cv2.rectangle(image, (int(coords[0]), int(coords[1])), (int(coords[2]), int(coords[3])), (0, 0, 0), 2)
+
+    cv2.putText(image, class_name, (int(coords[0]), int(coords[1] - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2)
+
+cv2.imwrite("output.jpeg", image)
diff --git a/Ray/hyperparameters/train_model.py b/Ray/hyperparameters/train_model.py
@@ -0,0 +1,67 @@
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from torchvision import datasets, transforms
+from torch.utils.data import DataLoader, random_split
+from ray import air, tune
+from ray.air import session
+import os
+
+class ConvNet(nn.Module):
+    def __init__(self):
+        super(ConvNet, self).__init__()
+        self.conv1 = nn.Conv2d(1, 3, kernel_size=3)
+        self.fc = nn.Linear(192, 10)
+
+    def forward(self, x):
+        x = torch.relu(torch.max_pool2d(self.conv1(x), 3))
+        x = x.view(-1, 192)
+        x = self.fc(x)
+        return torch.log_softmax(x, dim=1)
+
+def train_model(config):
+    net = ConvNet()
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    net.to(device)
+
+    criterion = nn.CrossEntropyLoss()
+    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
+
+    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
+    dataset = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
+    train_subset, val_subset = random_split(dataset, [50000, 10000])
+
+    trainloader = DataLoader(train_subset, batch_size=int(config["batch_size"]), shuffle=True)
+    valloader = DataLoader(val_subset, batch_size=int(config["batch_size"]), shuffle=True)
+
+    for epoch in range(10):
+        running_loss = 0.0
+        for i, data in enumerate(trainloader, 0):
+            inputs, labels = data
+            inputs, labels = inputs.to(device), labels.to(device)
+
+            optimizer.zero_grad()
+            outputs = net(inputs)
+            loss = criterion(outputs, labels)
+            loss.backward()
+            optimizer.step()
+            running_loss += loss.item()
+
+        val_loss = 0.0
+        total = 0
+        correct = 0
+        with torch.no_grad():
+            for data in valloader:
+                inputs, labels = data
+                inputs, labels = inputs.to(device), labels.to(device)
+                outputs = net(inputs)
+                val_loss += criterion(outputs, labels).item()
+                _, predicted = torch.max(outputs.data, 1)
+                total += labels.size(0)
+                correct += (predicted == labels).sum().item()
+
+        accuracy = correct / total
+
+        session.report({"loss": val_loss, "accuracy": accuracy})
+
+    print("Тренування завершено")
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		#!/bin/bash

		serve run object_detection:entrypoint