Skip to content

Latest commit

Β 

History

History
242 lines (190 loc) Β· 6.91 KB

README.md

File metadata and controls

242 lines (190 loc) Β· 6.91 KB

πŸŽ¬πŸŽžπŸ“Ά Movie Recommendation System

This project was created as part of the MLOps bootcamp (Sep24) πŸ› πŸ‘·πŸ»β€β™‚οΈ. The project demonstrates a comprehensive MLOps implementation for deploying and maintaining a movie recommendation system.

Project Repository: Dagshub

πŸ’» Developer Team:

Business Objectives

The Movie Recommendation application addresses the challenge of providing personalized movie recommendations to users on a streaming platform. By leveraging collaborative filtering techniques, it enhances the user experience by suggesting movies that align with individual tastes. Sponsored by a streaming service, the project aims to:

  • Increase user engagement through personalized content recommendations
  • Improve user retention by suggesting relevant movies
  • Enhance content discovery across the platform's catalog
  • Drive higher user satisfaction through accurate recommendations

πŸ”„ MLOps Workflow Overview

MLOps Workflow Diagram

πŸ“ View the high-resolution SVG version for better detail.

🎯 Detailed Pipeline Steps

Our MLOps pipeline consists of five major components, each handling specific aspects of the machine learning lifecycle:

1️⃣ CI/CD Pipeline (GitHub Actions)

  • ⏰ Scheduled Trigger: Daily at midnight
  • πŸ“ˆ Data Version Update: Increments data version
  • πŸš€ Pipeline Trigger: Initiates DVC pipeline
  • πŸ“€ Main Branch Update: Pushes changes
  • πŸ”„ API Deployment: Triggers new deployment (in progress 🚧)

2️⃣ DVC Pipeline (MLFlow/DVC)

  • πŸ“₯ Data Ingestion: Appends new data
  • βœ… Validation: Ensures data quality
  • πŸ”„ Transformation: Prepares features
  • 🧠 Model Training: Updates model
  • πŸ“Š Evaluation: Assesses performance

3️⃣ Experiment Monitoring

  • πŸ“š MLFlow Registry: Tracks experiments
  • πŸ’Ύ DVC Version Control: Manages artifacts

4️⃣ Deployed Application

  • πŸ†• API Updates: New versions (🚧)
  • πŸ‘₯ User Interaction: Real-time recommendations

5️⃣ Monitoring Stack

  • πŸ“ˆ Metrics Collection: Prometheus
  • πŸ“Š Dashboard: Grafana
  • ⚠️ Alerts: AlertManager (🚧)

πŸ”„ Pipeline Interactions

  1. Data Update Cycle
graph LR
    A[⏰ Cron Trigger] --> B[πŸ“ˆ Update Version]
    B --> C[πŸš€ Trigger Pipeline]
    C --> D[πŸ“₯ Process Data]
    D --> E[🧠 Train Model]
    E --> F[πŸ“Š Evaluation]
Loading
  1. Deployment Cycle
graph LR
    A[πŸ“Š Evaluation] --> B[πŸ“€ Push Changes]
    B --> C[πŸ”„ Deploy API]
    C --> D[πŸ‘₯ Users]
    D --> E[πŸ“ˆ Monitoring]
Loading

πŸ›  Component Details

1. Data Pipeline & Version Control

# Data versioning workflow
β”œβ”€β”€ πŸ“₯ Data Ingestion
β”œβ”€β”€ βœ… Validation
β”œβ”€β”€ πŸ”„ Transformation
β”œβ”€β”€ 🧠 Training
└── πŸ“Š Evaluation

2. Model Training & Deployment

# Model lifecycle
β”œβ”€β”€ πŸ§ͺ Experiment Tracking (MLFlow)
β”œβ”€β”€ πŸ“Š Performance Metrics
β”œβ”€β”€ πŸ“¦ Containerization (Docker)
└── πŸš€ API Deployment (FastAPI)

3. Monitoring & Alerts

# Monitoring stack
β”œβ”€β”€ πŸ“ˆ Metrics (Prometheus)
β”œβ”€β”€ πŸ“Š Visualization (Grafana)
└── ⚠️ Alerting (AlertManager)

πŸ” Workflow Deep Dive

1. Data Update Process

  1. ⏰ Trigger: Daily at midnight
  2. πŸ“ˆ Version Update: Increment data version
  3. πŸš€ Pipeline Start: Trigger DVC pipeline
  4. πŸ“₯ Data Processing: Execute pipeline stages
  5. πŸ“Š Validation: Ensure quality metrics

2. Model Training Cycle

  1. 🧠 Training: Update model with new data
  2. πŸ“Š Evaluation: Calculate performance metrics
  3. πŸ“š Registry: Record in MLFlow
  4. πŸ’Ύ Versioning: Save with DVC
  5. πŸ“€ Push: Update main branch

3. Deployment Process

  1. πŸ”„ Trigger: New model version available
  2. πŸ“¦ Container: Build new Docker image and push to Docker Hub
  3. πŸš€ Deploy: Update API service (🚧)
  4. πŸ‘₯ Users: Serve new predictions
  5. πŸ“ˆ Monitor: Track performance

πŸ“Š Monitoring & Feedback

Real-time Metrics

  • πŸ” API Performance

    • Response times
    • Request volumes
    • Error rates
  • πŸ“ˆ Model Metrics

    • Prediction accuracy
    • Processing time
    • Resource usage
  • ⚠️ Alerts

    • Performance degradation
    • Error thresholds
    • Resource constraints

Technical Architecture

Data Flow

  1. Raw data ingestion (data/raw/)
  2. Data preprocessing (data/interim/)
  3. Feature engineering (data/processed/)
  4. Model training (models/)
  5. API deployment

Component Structure

src/
β”œβ”€β”€ api/                    # FastAPI implementation
β”œβ”€β”€ data_module_def/        # Data processing modules
β”œβ”€β”€ models_module_def/      # Model definition and training
β”œβ”€β”€ pipeline_steps/         # DVC pipeline stages
└── utils/                  # Helper functions

Getting Started

1. Clone the project

git clone https://github.com/DataScientest-Studio/sep24_bmlops_int_reco_films.git
cd /sep24_bmlops_int_reco_films

2. Setup virtual environment & install dependencies

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Configure DVC and download data

# Configure access to DVC
dvc remote modify origin --local access_key_id YOUR_DVC_ACCESS_KEY
dvc remote modify origin --local secret_access_key YOUR_DVC_ACCESS_KEY

# Pull the data
dvc pull

4. Launch the application

docker-compose up

5. API Usage

Health Check

curl -X GET http://0.0.0.0:8000/status

Get Recommendations

curl -X 'POST' \
  'http://localhost:8000/users/recommendations' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "animation": 5,
  "children": 3,
  "comedy": 2,
  // ... other genre preferences
}'

6. Monitoring Dashboard

Access the Grafana dashboard at: http://localhost:3000/d/_eX4mpl3/fastapi-dashboard

πŸš€ Future Improvementes

The next steps we want to implement in the project:

  • Enhance CI/CD Pipeline and automate deploymentπŸ”„
  • Improve machine learning model 🧠
  • Implement user feedback system 🌐
  • Use Airflow for pipeline orchestration πŸš€
  • Implement Kubernetes deployment πŸ› 
  • Implement alertmanager πŸ“Š
  • Add testing πŸ”
  • Enhance API security by adding authentication πŸ”

More Information

For more information, take a look at our Wiki.

License

This project is licensed under the MIT License - see the LICENSE file for details.