Reco Film

This project aims to provide a service to recommend movies to users. It is developed as the final project of MLOps formation from DataScientest.

Run the project

This project uses a docker-compose to provides services relatives to our projects. Here is the list of all our services:

api provides project API
dev to develop the project using a docker container

First build docker images:

docker-compose build

Run API

NB: You can run API with the credentials: username: 'alice', password: 'x'.

Run API:

docker compose up api

When the API is running, the API documentation is provided here: http://localhost:8080/docs.

Test API:

curl -X GET -i http://localhost:8080/

Infer using model into API:

curl -X 'GET' \
  'http://localhost:8000/recommend/1' \
  -H 'accept: application/json'

Kill API:

docker container stop reco_api

Monitoring

This project includes monitoring using CSV Exporter, Prometheus, and Grafana.

Start monitoring services:

docker compose up monitoring

Access Grafana Dashboard:
- http://localhost:3000/
- Default credentials: admin / admin
Prometheus is available at:
- http://localhost:9090/
CSV Exporter is configured to export metrics to Prometheus.
Run the dev container:

docker compose run --rm dev

You are now inside the container.

Construct the dataset:

poetry run python app.py dataset

Prediction:

poetry run python app.py predict

Test

poetry run pytest -s

Dev API

Run API:

uvicorn api:api --host 0.0.0.0 --port 8000 --reload

See doc here:http://localhost:8000/docs

CI/CD and Testing

This project uses several tools to ensure quality and continuous integration:

MLflow

MLflow is used for tracking experiments, packaging code into reproducible runs, and sharing and deploying models. It helps in managing the machine learning lifecycle, including experimentation, reproducibility, and deployment.

Pytest

Pytest is used for writing and running tests. It is a mature testing framework that supports simple unit tests as well as complex functional testing. To run the tests, use the following command:

poetry run pytest -s

GitHub Actions

GitHub Actions is used for automating workflows, including running tests and deploying the application. It helps in setting up a CI/CD pipeline to ensure that the code is always in a deployable state.

Cahier des charges

0. Dev choices

To develop we recommend to use VSCode with extension ruff.
We choose to use poetry instead of requirement.txt because it Python packaging and dependency management made easy.

1. Architecture

The architecture of the Reco Film project is designed to efficiently handle movie recommendations through a microservices approach. Below is a detailed architecture diagram:

graph TD;
    %% User Authentication Flow
    A[User] -->|Credentials| X[Authentication Service];
    X -->|Authentication Token| B[API Gateway];
    
    %% API Requests
    B --> C[Reco Film API];

    %% API Interactions
    C --> D[(Database)];
    C --> E[Model Storage];
    C --> G[MLflow Tracking];
    C --> N[Prometheus CSV Exporter];
    C -->|Exposes Metrics| J[Prometheus];

    %% MLflow Components
    G --> H[MLflow Model Registry];
    E --> H;
    S[Model Training Pipeline] --> G;
    G -->|Registers Models| H;
    H -->|Provides Models| C;

    %% Monitoring Components
    N -->|Exports Metrics| J;
    J --> K[Grafana];

    %% CI/CD Pipeline
    L[GitHub Actions CI/CD] -->|Deploys| C;

    %% Data Processing
    I[Raw Data Storage] --> F[Data Processing Scripts];
    F --> D;

    %% Groupings
    subgraph Monitoring
        J
        K
        N
    end

    subgraph MLflow
        G
        H
    end

    subgraph CI/CD
        L
    end

    subgraph Data Processing
        I
        F
        D
    end

API Gateway: Manages incoming requests and routes them to the appropriate services.
Reco Film API: The core service that handles movie recommendation logic.
Database: Stores user data, movie data, and other necessary information.
Model Storage: Contains machine learning models used for generating recommendations.
MLflow Tracking: Manages the lifecycle of machine learning experiments.

2. Docker and Docker-compose

The project uses Docker and Docker Compose to manage and run services in isolated environments. The docker-compose.yml file defines the following services:

api: This service runs the FastAPI application, which serves the movie recommendation API. It is built from the Dockerfile and exposes port 8000 for API access.
test_api: A service used for testing the API, ensuring it is functioning correctly. It depends on the api service and uses the same network.
dev: A development environment that includes all necessary dependencies and tools for building and testing the application. It mounts local directories for live development and uses the host network mode.
mlflow-server: This service runs the MLflow server for tracking experiments and managing the machine learning lifecycle. It exposes port 5000 for accessing the MLflow UI.

The services are connected through a custom Docker network named reco_network, which allows them to communicate with each other using predefined IP addresses.

To build and run the services, use the following commands:

Build the Docker images:
```
docker-compose build
```
Run the API service:
```
docker-compose up api
```
Run the development environment:
```
docker-compose run --rm dev
```
Run the MLflow server:
```
docker-compose up mlflow-server
```

These services ensure that the application is scalable, maintainable, and easy to deploy.

3. API

API is implemented using FastAPI. It contains a dict object to record user names and passwords. Passwords are encoded using md5 algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
.vscode		.vscode
doc/api		doc/api
grafana/provisioning		grafana/provisioning
heimdall/config		heimdall/config
mlruns		mlruns
models		models
predictions		predictions
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
api.py		api.py
app.py		app.py
csv_exporter.py		csv_exporter.py
docker-compose.yml		docker-compose.yml
prometheus.yml		prometheus.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reco Film

Run the project

Run API

Monitoring

Dev API

CI/CD and Testing

MLflow

Pytest

GitHub Actions

0. Dev choices

1. Architecture

2. Docker and Docker-compose

3. API

Useful

About

Releases

Packages

Languages

License

DataScientest-Studio/juin24_cmlops_reco_film

Folders and files

Latest commit

History

Repository files navigation

Reco Film

Run the project

Run API

Monitoring

Dev API

CI/CD and Testing

MLflow

Pytest

GitHub Actions

0. Dev choices

1. Architecture

2. Docker and Docker-compose

3. API

Useful

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages