Writing an execution provider for TensorFlow Lite #12529

rgov · 2022-08-09T19:59:26Z

rgov
Aug 9, 2022

Earlier this year I filed #10248 floating the possibility of developing a TensorFlow Lite ("TFLite") execution provider. In particular my interest is taking advantage of Google's Coral Edge TPU devices, which are only supported through a TFLite "delegate" (similar to an execution provider) provided by libedgetpu.

I am looking for input from the ONNX Runtime team on whether this idea sounds reasonable, and some advice on how it might be accomplished.

Here's my understanding:

Converting ONNX models to TFLite requires using two Python scripts, first onnx-tensorflow to convert the .onnx to TensorFlow .pb protobuf frozen graph, then tf.lite.TFLiteConverter.from_frozen_graph() to convert to a .tflite flatbuffers file.

The execution provider would have to be able to invoke the Python interpreter. I do not know of any existing execution providers that execute subprocesses. How can this be done in C++?

It looks like the CoreML provider does something similar with writing converted models out to disk and then loading them back in.
GetCapability() would examine each node in the graph for whether TFLite can execute it.

I don't know how to best answer this question. The definitive answer lies in what the two tools above are able to translate.

There is a documented list of operations supported by TFLite but I don't know how to go about checking an ORT node against this list. TFLite also has an OpResolver class.

Also, TFLite does its own graph partitioning to delegate subgraph execution to hardware devices like the Coral, using the GraphPartitionHelper. Possibly ORT would only want to give TFLite subgraphs that the (hardware accelerated) TFLite delegate can execute, so ORT can determine assign other subgraphs to other EPs. However this would require some way of tracing TFLite subgraphs backward through the conversion process to the original ONNX nodes.

I asked over at the TensorFlow Forum whether there are APIs that could help.
Compile() would take each FusedNodeAndGraph and convert/load it into TFLite (tflite::FlatBufferModel::BuildFromFile()) and create a tflite::Interpreter from the model. Then it would generate a NodeComputeInfo for each one that populates the interpreter's input buffers, calls interpreter->Invoke(), then copies from the interpreter's output buffers to ORT's buffers.
I am assuming that a TFLite EP could also follow the CoreML EP in how it sets up its allocators and does not set up a kernel registry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing an execution provider for TensorFlow Lite #12529

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Writing an execution provider for TensorFlow Lite #12529

rgov Aug 9, 2022

Replies: 0 comments

rgov
Aug 9, 2022