This section discusses the following main topics:
- Example Flow
- TensorFlow Serving Basic Example
- Using a Streaming Inference Instance
- Streaming Inference with TensorFlow Serving REST API
- Accessing the REST API with curl
- Using Port Forwarding
- Example of Accessing REST API Using curl
- Streaming Inference with TensorFlow Serving gRPC API
- Useful External References
-
Save a trained TensorFlow Serving compatible model.
-
Send the data for inference in a JSON format, or in binary format using gRPC API.
-
Run the
nctl predict launch
command. -
Send the inference data using the
nctl predict stream
command, TensorFlow Serving REST API, or TensorFlow Serving gRPC API.
Basic models for testing TensorFlow Serving are included in the following GitHub TensorFlow Serving Repository. This example will use the saved_model_half_plus_two_cpu
model for showing streaming prediction capabilities.
Perform the following steps to use this model for streaming inference:
-
Clone the TensorFlow Serving Repository by executing the following command:
git clone https://github.com/tensorflow/serving
-
Perform step 3 or step 4 below, based on preference.
-
Run the following command:
nctl predict launch --local-model-location <directory where you have cloned Tensorflow Serving>/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu
-
Alternatively to step 3, you may want to save a trained model to an input shared folder, so it can be reused by other experiments/prediction instances. To do this, run these commands:
a. Use the mount command to mount the Nauta input folder to a local machine.
nctl mount
b. Run the resulting command printed by
nctl mount
(in this example, assuming that you will mount/mnt/input
share described innctl
command output). After executing the command printed by thenctl mount
command, you will be able to access input share on your local file system.c. Copy the saved_model_half_plus_two_cpu model to the input share folder (scroll right to see full contents):
cp -r <directory where you have cloned Tensorflow Serving>/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu <directory where you have mounted /mnt/input share>
d. Run the following command:
nctl predict launch --model-location /mnt/input/saved_model_half_plus_two_cpu
Note: The --model-name
can be passed optionally to nctl predict launch
command. If not provided, it assumes that the model name is equal to the last directory in model location:
/mnt/input/home/trained_mnist_model
-> trained_mnist_model
After running the predict launch
command, nctl
creates a streaming inference instance that can be used in multiple ways, as described below.
The nctl predict stream
command allows for performing inference on input data stored in a JSON format. This method is convenient for manually testing a trained model and provides a simple way to get inference results. For saved_model_half_plus_two_cpu
, write the following input data and save it in the inference-data.json
file:
{"instances": [1.0, 2.0, 5.0]}
The model named saved_model_half_plus_two_cpu
is a quite simple model: for given x
input value it predicts result of x/2 +2
operations. Having passed the following inputs to the model: 1.0
, 2.0
, and 5.0
, and so the expected predictions results are: 2.5
, 3.0
, and 4.5
.
To use that data for prediction, check the name of running prediction instance with saved_model_half_plus_two_cpu
model (the name will be displayed after nctl predict launch
command executes; you can also use nctl predict list
command for listing running prediction instances). Then, run following command:
nctl predict stream --name <prediction instance name> --data inference-data.json
The following results will be produced:
{ "predictions": [2.5, 3.0, 4.5] }
TensorFlow Serving exposes three different method verbs for getting inference results. Selecting the proper method verb depends on model used and the expected results. Refer to RESTful API for more detailed information. These method verbs are:
- classify
- regress
- predict
By default, nctl predict stream
will use predict
method verb. You can change it by passing --method-verb
parameter to nctl predict stream
command, for example:
nctl predict stream --name <prediction instance name> --data inference-data.json --method-verb classify
Another way to interact with a running prediction instance is to use TensorFlow Serving REST API. This approach could be useful for more sophisticated use cases, like integrating data-collection scripts/applications with prediction instances.
The URL and authorization header for accessing TensorFlow Serving REST API will be shown after the prediction instance is submitted, as in the following example (scroll right to see full contents).
Prediction instance URL (append method verb manually, e.g. :predict):
https://192.168.0.1:8443/api/v1/namespaces/jdoe/services/saved-mode-621-18-11-07-15-00-34:rest-port/proxy/v1/models/saved_model_half_plus_two_cpu
Authorize with following header:
Authorization: Bearer
1234567890abcdefghijklmnopqrstuvxyz
The example shows Accessing REST API using curl, with the following command (scroll right to see full contents):
curl -k -X POST -d @inference-data.json -H 'Authorization: Bearer <authorization token data>' localhost:8501/v1/models/<model_name, e.g. saved_model_half_plus_two_cpu>:predict
Alternatively, the Kubernetes port forwarding mechanism may be used. Create a port forwarding tunnel to the prediction instance with the following command:
kubectl port-forward service/<prediction instance name> :8501
Or, if you want to start a port forwarding tunnel in the background, do the following:
kubectl port-forward service/<prediction instance name> <some local port number>:8501 &
Note: The local port number of the tunnel you entered above; it will be produced by kubectl port-forward
if you do not explicitly specify it.
You should be able to access the REST API on the following URL (scroll right to see the full contents):
localhost:<local tunnel port number>/v1/models/<model_name, e.g. saved_model_half_plus_two_cpu>:<method verb>
Scroll right to see full contents.
curl -X POST -d @inference-data.json localhost:8501/v1/models/<model_name, e.g. saved_model_half_plus_two_cpu>:predict
Another way to interact with running prediction instance is to use TensorFlow Serving gRPC. This approach could be useful for more sophisticated use cases, such as integrating data collecting scripts/applications with prediction instances. Furthermore, it should provide better performance than a REST API.
To access TensorFlow Serving gRPC API of running prediction instance, the Kubernetes port forwarding mechanism must be used. Create a port forwarding tunnel to a prediction instance with following command:
kubectl port-forward service/<prediction instance name> :8500
Or, if you want to start port forwarding tunnel in background:
kubectl port-forward service/<prediction instance name> <some local port number>:8500 &
Note: The local port number of the tunnel you entered above; it will be produced by kubectl port-forward
if you do not explicitly specify it.
You can access the gRPC API by using a dedicated client gRPC client (such as the following GitHub Python script: mnist_client.py). Alternatively, use gRPC CLI client of your choice (such as: Polyglot and/or gRPC) and connect to:
localhost:<local tunnel port number>