MLflow Examples

MLflow examples - basic and advanced.

This repo consists of two sets of code artifacts:

Regular Python scripts using open source MLflow
Databricks notebooks using Databricks MLflow

Last updated: 2024-07-12

Examples

Python examples

sklearn - Scikit-learn model - train and score.
- Canonical example that shows multiple ways to train and score.
- Options to log ONNX model, autolog and save model signature.
- Train locally or against a Databricks cluster.
- Score real-time against a local web server or Docker container.
- Score batch with mlflow.load_model or Spark UDF>
sparkml - Spark ML model - train and score. ONNX too.
Keras/Tensorflow - train and score. ONNX working too.
- Keras with TensorFlow 2.x
  - keras_tf_wine - Wine quality dataset
  - keras_tf_mnist - MNIST dataset
- keras_tf1 - Keras with TensorFlow 1.x - legacy
xgboost - XGBoost (sklearn wrapper) model - train and score.
catboost - Catboost (using sklearn) model - train and score. ONNX working too.
pytorch - Pytorch - train and score. ONNX too.
onnx_sklearn - ONNX - Sklearn to ONNX train and score.
h2o - H2O model - train and score - with AutoML. ONNX too.
model_registry - Jupyter notebook sampling the Model Registry API.
e2e-ml-pipeline - End-to-end ML pipeline - training to real-time scoring.
reproduce - Reproduce an existing run.
nested_runs - Create a nested run with specified number of levels.
scoring_server_benchmarks - Scoring server performance benchmarks.

The sklearn and Spark ML examples also demonstrate:

Different ways to run a project with the mlflow CLI
Real-time server scoring with docker containers
Running a project against a Databricks cluster

Scala examples - uses the MLflow Java client

hello_world - Hello World - no training or scoring.
sparkml - Scala train and score - Spark ML and XGBoost4j
mleap - Score an MLeap model with MLeap runtime (no Spark dependencies).
onnx - Score an ONNX model (that was created in Scikit-learn) in Java.

Databricks

Databricks notebooks - current.
Notebook CICD - Lighweight CICD example with Databricks notebook. Legacy.

Docker

docker/docker-server - MLflow tracking server and MySQL database containers.

Setup

Use Python 3.8.

For Python environment use either:
- Miniconda with conda.yaml.
- Virtual environment with PyPi.
Install Spark 3.4.0.
For ONNX install see: python/sklearn/conda.yaml.

Miniconda

Install miniconda3: https://conda.io/miniconda.html
Create the environment: conda env create --file conda.yaml
Source the environment: source activate mlflow-examples

Virtual Environment

Create a virtual environment.

python -m venv mlflow-examples
source mlflow-examples/bin/activate

pip install the libraries in conda.yaml.

MLflow Server

You can either run the MLflow tracking server directly on your laptop or with Docker.

Docker

See docker/docker-server/README.

Laptop Tracking Server

You can either use the local file store or a database-backed store. See MLflow Storage documentation.

Note that new MLflow 1.4.0 Model Registry functionality seems only to work with the database-backed store.

First activate the virtual environment.

cd $HOME/mlflow-server
source $HOME/virtualenvs/mlflow-examples/bin/activate

File Store

Start the MLflow tracking server.

mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri $PWD/mlruns --default-artifact-root $PWD/mlruns

Database-backed store - MySQL

Install MySQL
Create an mlflow user with password.
Create a database mlflow

Start the MLflow Tracking Server

mlflow server --host 0.0.0.0 --port 5000 \
  --backend-store-uri mysql://MLFLOW_USER:MLFLOW_PASSWORD@localhost:3306/mlflow \
  --default-artifact-root $PWD/mlruns

Database-backed store - SQLite

mlflow server --host 0.0.0.0 --port 5000 \
  --backend-store-uri sqlite:///mlflow.db \
  --default-artifact-root $PWD/mlruns

Examples

Most of the examples use a DecisionTreeRegressor model with the wine quality data set.

As such, the python/sparkml and scala/sparkml are isomorphic as they are simply language variants of the same Spark ML algorithm.

Setup

Before running an experiment

export MLFLOW_TRACKING_URI=http://localhost:5000

Data

Data is in the data folder.

wine-quality-white.csv contains the training data.

Real-time scoring prediction data

The prediction files contain the first three records of wine-quality-white.csv.
The format is standard MLflow JSON-serialized Pandas DataFrames split orientation format described here.
Data in predict-wine-quality.json is directly derived from wine-quality-white.csv.
- The values are a mix of integers and doubles.
Apparently if you score predict-wine-quality.json against an MLeap SageMaker container, you will get errors as the server is unable to handle integers (bug).
Hence predict-wine-quality-float.json whose data is all doubles.

Name		Name	Last commit message	Last commit date
Latest commit History 764 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
data		data
databricks/notebooks		databricks/notebooks
docker/docker-server		docker/docker-server
python		python
scala		scala
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLflow Examples

Examples

Setup

Miniconda

Virtual Environment

MLflow Server

Docker

Laptop Tracking Server

File Store

Database-backed store - MySQL

Database-backed store - SQLite

Examples

Setup

Data

About

Releases

Packages

Languages

amesar/mlflow-examples

Folders and files

Latest commit

History

Repository files navigation

MLflow Examples

Examples

Setup

Miniconda

Virtual Environment

MLflow Server

Docker

Laptop Tracking Server

File Store

Database-backed store - MySQL

Database-backed store - SQLite

Examples

Setup

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages