This project aims to provide a service to recommend movies to users. It is developed as the final project of MLOps formation from DataScientest.
This project uses a docker-compose to provides services
relatives to our projects.
Here is the list of all our services:
api
provides project APIdev
to develop the project using a docker container
First build docker images:
docker-compose build
NB: You can run API with the credentials: username
: 'alice', password
: 'x'.
- Run API:
docker compose up api
When the API is running, the API documentation is provided here: http://localhost:8080/docs.
- Test API:
curl -X GET -i http://localhost:8080/
- Infer using model into API:
curl -X 'GET' \
'http://localhost:8000/recommend/1' \
-H 'accept: application/json'
- Kill API:
docker container stop reco_api
This project includes monitoring using CSV Exporter, Prometheus, and Grafana.
- Start monitoring services:
docker compose up monitoring
-
Access Grafana Dashboard:
http://localhost:3000/
- Default credentials:
admin
/admin
-
Prometheus is available at:
http://localhost:9090/
-
CSV Exporter is configured to export metrics to Prometheus.
-
Run the dev container:
docker compose run --rm dev
You are now inside the container.
- Construct the dataset:
poetry run python app.py dataset
- Prediction:
poetry run python app.py predict
- Test
poetry run pytest -s
- Run API:
uvicorn api:api --host 0.0.0.0 --port 8000 --reload
- See doc here:http://localhost:8000/docs
This project uses several tools to ensure quality and continuous integration:
MLflow is used for tracking experiments, packaging code into reproducible runs, and sharing and deploying models. It helps in managing the machine learning lifecycle, including experimentation, reproducibility, and deployment.
Pytest is used for writing and running tests. It is a mature testing framework that supports simple unit tests as well as complex functional testing. To run the tests, use the following command:
poetry run pytest -s
GitHub Actions is used for automating workflows, including running tests and deploying the application. It helps in setting up a CI/CD pipeline to ensure that the code is always in a deployable state.
-
To develop we recommend to use VSCode with extension ruff.
-
We choose to use poetry instead of
requirement.txt
because it Python packaging and dependency management made easy.
The architecture of the Reco Film project is designed to efficiently handle movie recommendations through a microservices approach. Below is a detailed architecture diagram:
graph TD;
%% User Authentication Flow
A[User] -->|Credentials| X[Authentication Service];
X -->|Authentication Token| B[API Gateway];
%% API Requests
B --> C[Reco Film API];
%% API Interactions
C --> D[(Database)];
C --> E[Model Storage];
C --> G[MLflow Tracking];
C --> N[Prometheus CSV Exporter];
C -->|Exposes Metrics| J[Prometheus];
%% MLflow Components
G --> H[MLflow Model Registry];
E --> H;
S[Model Training Pipeline] --> G;
G -->|Registers Models| H;
H -->|Provides Models| C;
%% Monitoring Components
N -->|Exports Metrics| J;
J --> K[Grafana];
%% CI/CD Pipeline
L[GitHub Actions CI/CD] -->|Deploys| C;
%% Data Processing
I[Raw Data Storage] --> F[Data Processing Scripts];
F --> D;
%% Groupings
subgraph Monitoring
J
K
N
end
subgraph MLflow
G
H
end
subgraph CI/CD
L
end
subgraph Data Processing
I
F
D
end
- API Gateway: Manages incoming requests and routes them to the appropriate services.
- Reco Film API: The core service that handles movie recommendation logic.
- Database: Stores user data, movie data, and other necessary information.
- Model Storage: Contains machine learning models used for generating recommendations.
- MLflow Tracking: Manages the lifecycle of machine learning experiments.
The project uses Docker and Docker Compose to manage and run services in isolated environments. The docker-compose.yml
file defines the following services:
- api: This service runs the FastAPI application, which serves the movie recommendation API. It is built from the
Dockerfile
and exposes port 8000 for API access. - test_api: A service used for testing the API, ensuring it is functioning correctly. It depends on the
api
service and uses the same network. - dev: A development environment that includes all necessary dependencies and tools for building and testing the application. It mounts local directories for live development and uses the host network mode.
- mlflow-server: This service runs the MLflow server for tracking experiments and managing the machine learning lifecycle. It exposes port 5000 for accessing the MLflow UI.
The services are connected through a custom Docker network named reco_network
, which allows them to communicate with each other using predefined IP addresses.
To build and run the services, use the following commands:
-
Build the Docker images:
docker-compose build
-
Run the API service:
docker-compose up api
-
Run the development environment:
docker-compose run --rm dev
-
Run the MLflow server:
docker-compose up mlflow-server
These services ensure that the application is scalable, maintainable, and easy to deploy.
API is implemented using FastAPI. It contains a dict
object to record user names and passwords. Passwords are encoded using md5 algorithm.