This project is a starting pack for MLOps projects focused on the subject of "movie recommendation". It provides a fully integrated local development environment where all tools and applications are containerized and managed through Docker Compose - all configured to work together seamlessly in your local environment. This allows you to develop, train, and deploy machine learning models for recommending movies with a complete MLOps toolchain running entirely on your machine.
If you want to deploy the system in a staging or production environment, you can use the GitHub Actions workflow build-and-push.yml
to build and push the app images (API and Streamlit) to your Docker Hub registry. This action is triggered manually or automatically when tags are pushed to the repository or when a release is published.
The project is organized as follows:
├── .github
│ └── workflows
│ ├── test-api.yml <- GitHub Actions workflow for testing the API.
│ └── build-and-push-images.yml <- GitHub Actions workflow for building and pushing the images.
│
├── airflow
│ ├── config
│ ├── dags
│ │ ├── scraping_new_movies.py <- DAG for scraping new movies.
│ │ └── train_model_dag.py <- DAG for training the model.
│ │
│ ├── logs
│ ├── plugins
│ ├── Dockerfile
│ ├── docker-compose.override.yaml
│ ├── docker-compose.yaml
│ └── requirements.txt
│
├── app
│ ├── api
│ │ └── predict
│ │ ├── Dockerfile
│ │ ├── main.py <- Main file for the API.
│ │ ├── metrics.py <- Metrics for the API.
│ │ └── requirements.txt <- Requirements for the API.
│ │
│ ├── streamlit
│ │ ├── pages
│ │ │ ├── 1_Recommandations.py <- Page for recommendations.
│ │ │ └── 2_Profil.py <- Page for the profile.
│ │ ├── Dockerfile
│ │ ├── Home.py <- Main page.
│ │ ├── requirements.txt <- Requirements for the Streamlit app.
│ │ ├── style.css <- CSS for the pages.
│ │ ├── supabase_auth.py <- Supabase authentication.
│ │ └── utils.py <- Utility functions.
│ │
│ └── docker-compose.yml <- Docker compose file for the Streamlit app and the API.
│
├── ml
│ ├── models
│ │ └── model.pkl <- Initial trained model.
│ └── src
│ ├── data
│ │ ├── check_structure.py <- Script for checking the structure of the data.
│ │ ├── import_raw_data.py <- Script for importing raw data.
│ │ └── load_data_in_db.py <- Script for loading data into the database.
│ ├── features
│ │ └── build_features.py <- Script for building features.
│ ├── models
│ │ ├── predict_model.py <- Script for making predictions.
│ │ └── train_model.py <- Script for training the model.
│ └── requirements.txt <- Requirements for the project.
│
├── mlflow
│ ├── docker-compose.yml
│ ├── Dockerfile
│ └── requirements.txt
│
├── monitoring
│ ├── grafana
│ │ └── provisioning
│ │ ├── dashboards
│ │ │ ├── api_dashboard.json <- Dashboard for the API.
│ │ │ └── dashboards.yml <- Dashboards configuration.
│ │ └── datasources
│ │ └── datasource.yml <- Datasource configuration.
│ └── prometheus
│ │ └── prometheus.yml <- Prometheus configuration.
│ │
│ └── docker-compose.yml
│
├── supabase
│ ├── README.md
│ ├── docker-compose.override.yml
│ ├── docker-compose.s3.yml
│ ├── docker-compose.yml
│ └── volumes
│ ├── api
│ │ └── kong.yml
│ ├── db
│ │ ├── _supabase.sql
│ │ ├── init
│ │ │ ├── 01-project-tables.sql <- SQL script for creating project tables.
│ │ │ ├── 02-auth-trigger.sql <- SQL script for creating the auth trigger.
│ │ │ ├── 03-security-policies.sql <- SQL script for creating security policies.
│ │ │ └── data.sql
│ │ ├── jwt.sql
│ │ ├── logs.sql
│ │ ├── pooler.sql
│ │ ├── realtime.sql
│ │ ├── roles.sql
│ │ └── webhooks.sql
│ ├── functions
│ │ ├── hello
│ │ │ └── index.ts
│ │ └── main
│ │ └── index.ts
│ ├── logs
│ │ └── vector.yml
│ └── pooler
│ └── pooler.exs
│
├── tests
│ ├── requirements.txt <- Requirements for the tests.
│ ├── test_api_predict.py <- Test for the API.
│ └── test_rls.py <- Test for the RLS.
│
├── .env.example <- Example of the .env file.
├── .gitignore <- Git ignore file.
├── LICENSE
├── Makefile <- Makefile for the project.
├── README.md <- This README file.
├── requirements-dev.txt
└── requirements-ref.txt
- Very basic model to recommend movies.
- Scraping new movies from The Movie Database (TMDB) API.
- Training a machine learning model to recommend movies.
- Deploying the model as a REST API.
- Building an interactive web app to display recommendations.
- Authentication.
- Orchestrating the workflow.
- Monitoring the system.
- Python: The main programming language used for data processing, model training, and prediction.
- Docker & Docker Compose: Used for containerizing the application and setting up a local development environment.
- Supabase: A backend service for managing the database and authentication.
- MLflow: For tracking experiments and managing machine learning models.
- Apache Airflow: For orchestrating data workflows and model training pipelines.
- Streamlit: For building interactive web applications to display recommendations.
- FastAPI: For building the REST API for the movie recommendation service.
- Prometheus: For monitoring and alerting.
- Grafana: For visualizing metrics.
- scikit-learn: For building and training machine learning models.
- GitHub Actions: For continuous integration and deployment workflows.
Make sure you have the following tools installed: Docker, Docker Compose, Python 3.10+, pip, git et make
docker --version
docker compose version
python --version
pip --version
git --version
make --version
To set up the project for local development, from the root of the repository follow the steps below:
-
Run the
make setup1
command. -
Set the environment variable TMDB_API_TOKEN in the .env file. This is necessary to be able to execute the DAG
scraping_new_movies.py
in Airflow. You can get a token here. -
Run the
make setup2
command. -
Run the
make start
command. -
Setup access and the bucket for MLFlow:
- Access MinIO console at
localhost:9001
and sign in with root credentials from.env
- Create access key and save the generated keys
- Create a bucket named
mlflow
- Update
.env
with your access/secret keys and restart the containers:cd mlflow docker compose down docker compose --env-file ../.env up -d --build
- Access MinIO console at
-
Secrets for the GitHub Actions workflow:
- In order to push the images to the Docker Hub registry, you need to create a personal access token with the necessary permissions.
- Add the token (read and write) for the Docker Hub registry as a secret with the name
DOCKERHUB_TOKEN
. - Add the username for the Docker Hub registry as a secret with the name
DOCKERHUB_USERNAME
.
Local access to the Services:
- Supabase: http://localhost:8000
- Airflow: http://localhost:8080
- Streamlit: http://localhost:8501
- API: http://localhost:8002/docs
- MLFlow: http://localhost:5001
- MinIO: http://localhost:9001
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000
You can open the Streamlit app, create an account and have your first recommendations!
In order for the app to work you need to use an email with a number just before the @ symbol (ex: [email protected]
, [email protected]
, etc.). It'll allow you to have an existing profil linked to your account.
You can run the DAGs in Airflow to scrape new movies and train the model.
You can explore the artefacts, metrics and logs of the model in MLFlow if you runned the training DAG.
You can explore the metrics in Grafana's dashboard.
You can explore the database in Supabase's dashboard.
Sarah Hemmel
Mikhael Benilouz
Antoine Pelamourgues
This project is licensed under the MIT License.