This project was created as part of the MLOps bootcamp (Sep24) π π·π»ββοΈ. The project demonstrates a comprehensive MLOps implementation for deploying and maintaining a movie recommendation system.
Project Repository: Dagshub
The Movie Recommendation application addresses the challenge of providing personalized movie recommendations to users on a streaming platform. By leveraging collaborative filtering techniques, it enhances the user experience by suggesting movies that align with individual tastes. Sponsored by a streaming service, the project aims to:
- Increase user engagement through personalized content recommendations
- Improve user retention by suggesting relevant movies
- Enhance content discovery across the platform's catalog
- Drive higher user satisfaction through accurate recommendations
π View the high-resolution SVG version for better detail.
Our MLOps pipeline consists of five major components, each handling specific aspects of the machine learning lifecycle:
- β° Scheduled Trigger: Daily at midnight
- π Data Version Update: Increments data version
- π Pipeline Trigger: Initiates DVC pipeline
- π€ Main Branch Update: Pushes changes
- π API Deployment: Triggers new deployment (in progress π§)
- π₯ Data Ingestion: Appends new data
- β Validation: Ensures data quality
- π Transformation: Prepares features
- π§ Model Training: Updates model
- π Evaluation: Assesses performance
- π MLFlow Registry: Tracks experiments
- πΎ DVC Version Control: Manages artifacts
- π API Updates: New versions (π§)
- π₯ User Interaction: Real-time recommendations
- π Metrics Collection: Prometheus
- π Dashboard: Grafana
β οΈ Alerts: AlertManager (π§)
- Data Update Cycle
graph LR
A[β° Cron Trigger] --> B[π Update Version]
B --> C[π Trigger Pipeline]
C --> D[π₯ Process Data]
D --> E[π§ Train Model]
E --> F[π Evaluation]
- Deployment Cycle
graph LR
A[π Evaluation] --> B[π€ Push Changes]
B --> C[π Deploy API]
C --> D[π₯ Users]
D --> E[π Monitoring]
# Data versioning workflow
βββ π₯ Data Ingestion
βββ β
Validation
βββ π Transformation
βββ π§ Training
βββ π Evaluation
# Model lifecycle
βββ π§ͺ Experiment Tracking (MLFlow)
βββ π Performance Metrics
βββ π¦ Containerization (Docker)
βββ π API Deployment (FastAPI)
# Monitoring stack
βββ π Metrics (Prometheus)
βββ π Visualization (Grafana)
βββ β οΈ Alerting (AlertManager)
- β° Trigger: Daily at midnight
- π Version Update: Increment data version
- π Pipeline Start: Trigger DVC pipeline
- π₯ Data Processing: Execute pipeline stages
- π Validation: Ensure quality metrics
- π§ Training: Update model with new data
- π Evaluation: Calculate performance metrics
- π Registry: Record in MLFlow
- πΎ Versioning: Save with DVC
- π€ Push: Update main branch
- π Trigger: New model version available
- π¦ Container: Build new Docker image and push to Docker Hub
- π Deploy: Update API service (π§)
- π₯ Users: Serve new predictions
- π Monitor: Track performance
-
π API Performance
- Response times
- Request volumes
- Error rates
-
π Model Metrics
- Prediction accuracy
- Processing time
- Resource usage
-
β οΈ Alerts- Performance degradation
- Error thresholds
- Resource constraints
- Raw data ingestion (
data/raw/
) - Data preprocessing (
data/interim/
) - Feature engineering (
data/processed/
) - Model training (
models/
) - API deployment
src/
βββ api/ # FastAPI implementation
βββ data_module_def/ # Data processing modules
βββ models_module_def/ # Model definition and training
βββ pipeline_steps/ # DVC pipeline stages
βββ utils/ # Helper functions
git clone https://github.com/DataScientest-Studio/sep24_bmlops_int_reco_films.git
cd /sep24_bmlops_int_reco_films
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Configure access to DVC
dvc remote modify origin --local access_key_id YOUR_DVC_ACCESS_KEY
dvc remote modify origin --local secret_access_key YOUR_DVC_ACCESS_KEY
# Pull the data
dvc pull
docker-compose up
curl -X GET http://0.0.0.0:8000/status
curl -X 'POST' \
'http://localhost:8000/users/recommendations' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"animation": 5,
"children": 3,
"comedy": 2,
// ... other genre preferences
}'
Access the Grafana dashboard at: http://localhost:3000/d/_eX4mpl3/fastapi-dashboard
The next steps we want to implement in the project:
- Enhance CI/CD Pipeline and automate deploymentπ
- Improve machine learning model π§
- Implement user feedback system π
- Use Airflow for pipeline orchestration π
- Implement Kubernetes deployment π
- Implement alertmanager π
- Add testing π
- Enhance API security by adding authentication π
For more information, take a look at our Wiki.
This project is licensed under the MIT License - see the LICENSE file for details.