fastai serving

A Docker image for serving fastai models, mimicking the API of Tensorflow Serving. It is designed for running batch inference at scale. It is not optimized for performance (but it's not that slow).

Build

First, export a fastai Learner with .export. Assuming that this file is in model_dir, you can build the serving image like so:

# docker build -f Dockerfile.[cpu/gpu] --build-arg MODEL_DIR=./model_dir -t <org>/<image>:<tag> .`

If you require additional utils files for loading the model with load_learner, you can mount an additional directory at build time with:

# docker build -f Dockerfile.[cpu/gpu] --build-arg MODEL_DIR=./model_dir --build-arg UTILS_DIR=./utils -t org/image:tag .`

Run

docker run --rm -p 8501:8501 -t org/image:tag .

Use

The API currently has two endpoints:

`POST /analyze:predict`

Accepts a JSON request in the form:

{
  "instances": [
    {
      "image_bytes": {
        "b64": "[b64_string]"
      }
    }
  ],
  ...
}

where each b64_string is a base-64 encoded string representing the model input.

`GET /analyze`

Returns an HTTP Status of 200 as long as the API is running (health check).

Limitations, Motivation, and Future Directions

This was written so fastai models could be used with chip-n-scale, an orchestration pipeline for running machine learning inference at scale. It has only been tested in that context.
It has only been tested with a few CNN models.
It only uses the first transform from the validation data loader to transform input data.
Comparison to TensorFlow Serving: This repo currently only implements a single replica of a TensorFlow serving endpoint and doesn't have any of the additional features that it supports (multiple models, gRPC support, batching scheduler, etc.). We're happy to accept pull requests which increase the functionality in this regard.
Pytorch JIT: A popular guide to deploying PyTorch models (of which fastai models are a subset), shows how to create a traced_script_module for faster inference. Future iterations of this repo may explore these methods to improve inference times.

Acknowledgments

The code for server.py is taken almost entirely from the fastai example for Render. The primary addition is the batch inference code which can provide significant speed-ups compared to single image prediction.
This work was undertaken in partnership with our friends at Sinergise and funded by the European Space Agency, specifically Phi Lab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

fastai serving

Build

Run

Use

`POST /analyze:predict`

`GET /analyze`

Limitations, Motivation, and Future Directions

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

fastai serving

Build

Run

Use

POST /analyze:predict

GET /analyze

Limitations, Motivation, and Future Directions

Acknowledgments

`POST /analyze:predict`

`GET /analyze`