Out of the box, mlserver
supports the deployment and serving of xgboost
models.
By default, it will assume that these models have been serialised using the bst.save_model()
method.
In this example, we will cover how we can train and serialise a simple model, to then serve it using mlserver
.
The first step will be to train a simple xgboost
model.
For that, we will use the mushrooms example from the xgboost
Getting Started guide.
# Original code and extra details can be found in:
# https://xgboost.readthedocs.io/en/latest/get_started.html#python
import os
import xgboost as xgb
import requests
from urllib.parse import urlparse
from sklearn.datasets import load_svmlight_file
TRAIN_DATASET_URL = 'https://raw.githubusercontent.com/dmlc/xgboost/master/demo/data/agaricus.txt.train'
TEST_DATASET_URL = 'https://raw.githubusercontent.com/dmlc/xgboost/master/demo/data/agaricus.txt.test'
def _download_file(url: str) -> str:
parsed = urlparse(url)
file_name = os.path.basename(parsed.path)
file_path = os.path.join(os.getcwd(), file_name)
res = requests.get(url)
with open(file_path, 'wb') as file:
file.write(res.content)
return file_path
train_dataset_path = _download_file(TRAIN_DATASET_URL)
test_dataset_path = _download_file(TEST_DATASET_URL)
# NOTE: Workaround to load SVMLight files from the XGBoost example
X_train, y_train = load_svmlight_file(train_dataset_path)
X_test, y_test = load_svmlight_file(test_dataset_path)
X_train = X_train.toarray()
X_test = X_test.toarray()
# read in data
dtrain = xgb.DMatrix(data=X_train, label=y_train)
# specify parameters via map
param = {'max_depth':2, 'eta':1, 'objective':'binary:logistic' }
num_round = 2
bst = xgb.train(param, dtrain, num_round)
bst
To save our trained model, we will serialise it using bst.save_model()
and the JSON format.
This is the approach by the XGBoost project.
Our model will be persisted as a file named mushroom-xgboost.json
.
model_file_name = 'mushroom-xgboost.json'
bst.save_model(model_file_name)
Now that we have trained and saved our model, the next step will be to serve it using mlserver
.
For that, we will need to create 2 configuration files:
settings.json
: holds the configuration of our server (e.g. ports, log level, etc.).model-settings.json
: holds the configuration of our model (e.g. input type, runtime to use, etc.).
%%writefile settings.json
{
"debug": "true"
}
%%writefile model-settings.json
{
"name": "mushroom-xgboost",
"implementation": "mlserver_xgboost.XGBoostModel",
"parameters": {
"uri": "./mushroom-xgboost.json",
"version": "v0.1.0"
}
}
Now that we have our config in-place, we can start the server by running mlserver start .
. This needs to either be ran from the same directory where our config files are or pointing to the folder where they are.
mlserver start .
Since this command will start the server and block the terminal, waiting for requests, this will need to be ran in the background on a separate terminal.
We now have our model being served by mlserver
.
To make sure that everything is working as expected, let's send a request from our test set.
For that, we can use the Python types that mlserver
provides out of box, or we can build our request manually.
import requests
x_0 = X_test[0:1]
inference_request = {
"inputs": [
{
"name": "predict",
"shape": x_0.shape,
"datatype": "FP32",
"data": x_0.tolist()
}
]
}
endpoint = "http://localhost:8080/v2/models/mushroom-xgboost/versions/v0.1.0/infer"
response = requests.post(endpoint, json=inference_request)
response.json()
As we can see above, the model predicted the input as close to 0
, which matches what's on the test set.
y_test[0]