Triton is a machine learning inference server for easy and highly optimized deployment of models trained in almost any major framework. This backend specifically facilitates use of tree models in Triton (including models trained with XGBoost, LightGBM, Scikit-Learn, and cuML).
If you want to deploy a tree-based model for optimized real-time or batched inference in production, the FIL backend for Triton will allow you to do just that.
- Installation
- Introductory end-to-end example
- FAQ notebook with code snippets for many common scenarios
- Model Configuration
- Explainability and Shapley value support
- Scikit-Learn and cuML model support
- Model support and limitations
If you aren't sure where to start with this documentation, consider one of the following paths:
I currently use XGBoost/LightGBM or other tree models and am trying to assess if Triton is the right solution for production deployment of my models
- Check out the FIL backend's blog post announcement
- Make sure your model is supported by looking at the model support section
- Look over the introductory example
- Try deploying your own model locally by consulting the FAQ notebook.
- Check out the main Triton documentation for additional features and helpful tips on deployment (including example Helm charts).
I am familiar with Triton, but I am using it to deploy an XGBoost/LightGBM model for the first time.
- Look over the introductory example
- Try deploying your own model locally by consulting the FAQ notebook. Note that it includes specific example code for serialization of XGBoost and LightGBM models.
- Review the FAQ notebook's tips for optimizing model performance.
I am familiar with Triton and the FIL backend, but I am using it to deploy a Scikit-Learn or cuML tree model for the first time
- Look at the section on preparing Scikit-Learn/cuML models for Triton.
- Try deploying your model by consulting the FAQ notebook, especially the sections on Scikit-Learn and cuML.
I am a data scientist familiar with tree model training, and I am trying to understand how Triton might be used with my models.
- Take a glance at the Triton product page to get a sense of what Triton is used for.
- Download and run the introductory example for yourself. If you do not have access to a GPU locally, you can just look over this notebook and then jump to the FAQ notebook which has specific information on CPU-only training and deployment.
I have never worked with tree models before.
- Take a look at XGBoost's documentation.
- Download and run the introductory example for yourself.
- Try deploying your own model locally by consulting the FAQ notebook.
I don't like reading docs.
- Look at the Quickstart below
- Open the FAQs notebook in a browser.
- Try deploying your model. If you get stuck,
Ctrl-F
for keywords on the FAQ page.
- Copy your model into the following directory structure. In this example, we show an XGBoost json file, but XGBoost binary files, LightGBM text files, and Treelite checkpoint files are also supported.
model_repository/
├─ example/
│ ├─ 1/
│ │ ├─ model.json
│ ├─ config.pbtxt
- Fill out config.pbtxt as follows, replacing
$NUM_FEATURES
with the number of input features,$MODEL_TYPE
withxgboost
,xgboost_json
,lightgbm
ortreelite_checkpoint
, and$IS_A_CLASSIFIER
withtrue
orfalse
depending on whether this is a classifier or regressor.
backend: "fil"
max_batch_size: 32768
input [
{
name: "input__0"
data_type: TYPE_FP32
dims: [ $NUM_FEATURES ]
}
]
output [
{
name: "output__0"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
instance_group [{ kind: KIND_AUTO }]
parameters [
{
key: "model_type"
value: { string_value: "$MODEL_TYPE" }
},
{
key: "output_class"
value: { string_value: "$IS_A_CLASSIFIER" }
}
]
dynamic_batching {}
- Start the server:
docker run -p 8000:8000 -p 8001:8001 --gpus all \
-v ${PWD}/model_repository:/models \
nvcr.io/nvidia/tritonserver:23.09-py3 \
tritonserver --model-repository=/models
The Triton server will now be serving your model over both HTTP (port 8000) and GRPC (port 8001) using NVIDIA GPUs if they are available or the CPU if they are not. For information on how to submit inference requests, how to deploy other tree model types, or advanced configuration options, check out the FAQ notebook.