Inference API is listening on port 8080 and only accessible from localhost by default. To change the default setting, see MMS Configuration.
There are three type of APIs:
- API description - Describe MMS inference APIs with OpenAPI 3.0 specification
- Health check API - Check MMS health status
- Predictions API - Make predictions API call to MMS
To view a full list of inference API, you can use following command:
curl -X OPTIONS http://localhost:8443
The out is OpenAPI 3.0.1 json format. You can use it to generate client code, see swagger codegen for detail.
MMS support a ping
API that user can check MMS health status:
curl http://localhost:8080/ping
Your response, if the server is running should be:
{
"health": "healthy!"
}
MMS 1.0 support 0.4 style API calls, those APIs are deprecated, they will be removed in future release. See Deprecated APIs for detail.
For each loaded model, user can make REST call to URI: /predictions/{model_name}
- POST /predictions/{model_name}
curl Example
curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
curl -X POST http://localhost:8080/predictions/resnet-18 -T kitten.jpg
or:
curl -X POST http://localhost:8080/predictions/resnet-18 -F "[email protected]"
The result was some JSON that told us our image likely held a tabby cat. The highest prediction was:
{
"class": "n02123045 tabby, tabby cat",
"probability": 0.42514491081237793,
...
}
MMS 0.4 style predict API is kept for backward compatible purpose, and will be removed in future release.
- POST /{model_name}/predict
curl Example
curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
curl -X POST http://localhost:8080/resnet-18/predict -F "[email protected]"