🎬🎞📶 Movie Recommendation System

This project was created as part of the MLOps bootcamp (Sep24) 🛠👷🏻‍♂️. The project demonstrates a comprehensive MLOps implementation for deploying and maintaining a movie recommendation system.

Project Repository: Dagshub

💻 Developer Team:

Asma Heena Khalil (@asma484)
Ringo Schwabe (@roongi)
Carolin Stolpe (@castolpe)

Business Objectives

The Movie Recommendation application addresses the challenge of providing personalized movie recommendations to users on a streaming platform. By leveraging collaborative filtering techniques, it enhances the user experience by suggesting movies that align with individual tastes. Sponsored by a streaming service, the project aims to:

Increase user engagement through personalized content recommendations
Improve user retention by suggesting relevant movies
Enhance content discovery across the platform's catalog
Drive higher user satisfaction through accurate recommendations

🔄 MLOps Workflow Overview

📝 View the high-resolution SVG version for better detail.

🎯 Detailed Pipeline Steps

Our MLOps pipeline consists of five major components, each handling specific aspects of the machine learning lifecycle:

1️⃣ CI/CD Pipeline (GitHub Actions)

⏰ Scheduled Trigger: Daily at midnight
📈 Data Version Update: Increments data version
🚀 Pipeline Trigger: Initiates DVC pipeline
📤 Main Branch Update: Pushes changes
🔄 API Deployment: Triggers new deployment (in progress 🚧)

2️⃣ DVC Pipeline (MLFlow/DVC)

📥 Data Ingestion: Appends new data
✅ Validation: Ensures data quality
🔄 Transformation: Prepares features
🧠 Model Training: Updates model
📊 Evaluation: Assesses performance

3️⃣ Experiment Monitoring

📚 MLFlow Registry: Tracks experiments
💾 DVC Version Control: Manages artifacts

4️⃣ Deployed Application

🆕 API Updates: New versions (🚧)
👥 User Interaction: Real-time recommendations

5️⃣ Monitoring Stack

📈 Metrics Collection: Prometheus
📊 Dashboard: Grafana
⚠️ Alerts: AlertManager (🚧)

🔄 Pipeline Interactions

Data Update Cycle

graph LR
    A[⏰ Cron Trigger] --> B[📈 Update Version]
    B --> C[🚀 Trigger Pipeline]
    C --> D[📥 Process Data]
    D --> E[🧠 Train Model]
    E --> F[📊 Evaluation]

Deployment Cycle

graph LR
    A[📊 Evaluation] --> B[📤 Push Changes]
    B --> C[🔄 Deploy API]
    C --> D[👥 Users]
    D --> E[📈 Monitoring]

🛠 Component Details

1. Data Pipeline & Version Control

# Data versioning workflow
├── 📥 Data Ingestion
├── ✅ Validation
├── 🔄 Transformation
├── 🧠 Training
└── 📊 Evaluation

2. Model Training & Deployment

# Model lifecycle
├── 🧪 Experiment Tracking (MLFlow)
├── 📊 Performance Metrics
├── 📦 Containerization (Docker)
└── 🚀 API Deployment (FastAPI)

3. Monitoring & Alerts

# Monitoring stack
├── 📈 Metrics (Prometheus)
├── 📊 Visualization (Grafana)
└── ⚠️ Alerting (AlertManager)

🔍 Workflow Deep Dive

1. Data Update Process

⏰ Trigger: Daily at midnight
📈 Version Update: Increment data version
🚀 Pipeline Start: Trigger DVC pipeline
📥 Data Processing: Execute pipeline stages
📊 Validation: Ensure quality metrics

2. Model Training Cycle

🧠 Training: Update model with new data
📊 Evaluation: Calculate performance metrics
📚 Registry: Record in MLFlow
💾 Versioning: Save with DVC
📤 Push: Update main branch

3. Deployment Process

🔄 Trigger: New model version available
📦 Container: Build new Docker image and push to Docker Hub
🚀 Deploy: Update API service (🚧)
👥 Users: Serve new predictions
📈 Monitor: Track performance

📊 Monitoring & Feedback

Real-time Metrics

🔍 API Performance
- Response times
- Request volumes
- Error rates
📈 Model Metrics
- Prediction accuracy
- Processing time
- Resource usage
⚠️ Alerts
- Performance degradation
- Error thresholds
- Resource constraints

Technical Architecture

Data Flow

Raw data ingestion (data/raw/)
Data preprocessing (data/interim/)
Feature engineering (data/processed/)
Model training (models/)
API deployment

Component Structure

src/
├── api/                    # FastAPI implementation
├── data_module_def/        # Data processing modules
├── models_module_def/      # Model definition and training
├── pipeline_steps/         # DVC pipeline stages
└── utils/                  # Helper functions

Getting Started

1. Clone the project

git clone https://github.com/DataScientest-Studio/sep24_bmlops_int_reco_films.git
cd /sep24_bmlops_int_reco_films

2. Setup virtual environment & install dependencies

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Configure DVC and download data

# Configure access to DVC
dvc remote modify origin --local access_key_id YOUR_DVC_ACCESS_KEY
dvc remote modify origin --local secret_access_key YOUR_DVC_ACCESS_KEY

# Pull the data
dvc pull

4. Launch the application

docker-compose up

5. API Usage

Health Check

curl -X GET http://0.0.0.0:8000/status

Get Recommendations

curl -X 'POST' \
  'http://localhost:8000/users/recommendations' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "animation": 5,
  "children": 3,
  "comedy": 2,
  // ... other genre preferences
}'

6. Monitoring Dashboard

Access the Grafana dashboard at: http://localhost:3000/d/_eX4mpl3/fastapi-dashboard

🚀 Future Improvementes

The next steps we want to implement in the project:

Enhance CI/CD Pipeline and automate deployment🔄
Improve machine learning model 🧠
Implement user feedback system 🌐
Use Airflow for pipeline orchestration 🚀
Implement Kubernetes deployment 🛠
Implement alertmanager 📊
Add testing 🔍
Enhance API security by adding authentication 🔐

More Information

For more information, take a look at our Wiki.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.dvc		.dvc
.github/workflows		.github/workflows
data		data
metrics		metrics
models		models
monitoring		monitoring
notebooks		notebooks
references		references
reports		reports
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile.api		Dockerfile.api
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
custom_logger.py		custom_logger.py
data_version.txt		data_version.txt
docker-compose.yml		docker-compose.yml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
requirements.txt		requirements.txt

License

DataScientest-Studio/sep24_bmlops_int_reco_films

Folders and files

Latest commit

History

Repository files navigation

🎬🎞📶 Movie Recommendation System

💻 Developer Team:

Business Objectives

🔄 MLOps Workflow Overview

🎯 Detailed Pipeline Steps

1️⃣ CI/CD Pipeline (GitHub Actions)

2️⃣ DVC Pipeline (MLFlow/DVC)

3️⃣ Experiment Monitoring

4️⃣ Deployed Application

5️⃣ Monitoring Stack

🔄 Pipeline Interactions

🛠 Component Details

1. Data Pipeline & Version Control

2. Model Training & Deployment

3. Monitoring & Alerts

🔍 Workflow Deep Dive

1. Data Update Process

2. Model Training Cycle

3. Deployment Process

📊 Monitoring & Feedback

Real-time Metrics

Technical Architecture

Data Flow

Component Structure

Getting Started

1. Clone the project

2. Setup virtual environment & install dependencies

3. Configure DVC and download data

4. Launch the application

5. API Usage

Health Check

Get Recommendations

6. Monitoring Dashboard

🚀 Future Improvementes

More Information

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages