Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for inference on ONNX and TensorFlow models #370

Open
3 of 6 tasks
Michael-F-Bryan opened this issue Nov 3, 2021 · 1 comment
Open
3 of 6 tasks

Add support for inference on ONNX and TensorFlow models #370

Michael-F-Bryan opened this issue Nov 3, 2021 · 1 comment
Assignees
Labels
area - model inference ML Model support area - runtime The Rust Rune runtime category - enhancement New feature or request priority - on-demand This won't be touched until there is an external need for it (i.e. required by a customer)

Comments

@Michael-F-Bryan
Copy link
Contributor

I think almost everyone on the HOTG team has expressed a desire to use more ML frameworks at some point, in particular ONNX and Tensor Flow. However, I was reluctant to use bindings that go through their official C++ implementations after seeing how much trouble we had integrating TensorFlow Lite.

When I was playing around with hotg-ai/wasi-nn-experiment I came across a pure Rust implementation TensorFlow and ONNX inference called tract. This was able to cross-compile to aarch64-linux-android and wasm32-unknown-unknown without any extra work.

By using tract instead of the reference implementations we'll be giving up some performance, reliability, and features (e.g. missing model ops) in exchange for long term maintainability and reduced build complexity. @f0rodo may want to comment on this trade-off, but from an engineering perspective I think it's worth it.

The things we'll need to support new model types:

  • Add an args field to models inside the Runefile (done)
  • Let the user provide a format argument which is either "tensorflow-lite", "tensorflow", or "onnx" to specify what type of model this is (default is "tensorflow-lite" if not provided) (example)
  • Convert the format into a mimetype that gets embedded in the Rune and passed to the runtime when loading a model (conversion, injecting into the generated Rune)
  • Create new ModelFactory implementations for handling TensorFlow and ONNX models
  • Register the new TensorFlow and ONNX model factories as part of our BaseImage::with_defaults() (maybe hide them behind a feature flag like we did with "tensorflow-lite" so users can cut down on dependencies, it's up to you)
  • Add integration tests that try to compile and run Runes with TensorFlow and ONNX models
@Michael-F-Bryan Michael-F-Bryan added category - enhancement New feature or request priority - on-demand This won't be touched until there is an external need for it (i.e. required by a customer) area - runtime The Rust Rune runtime labels Nov 3, 2021
@saidinesh5
Copy link
Contributor

@Michael-F-Bryan long term maintainability will be more problematic though. tract does NOT implement all the operators that tf-lite / ONNX provides. Even ONNX support is not 100%, and this is a moving target. So whenever a user's model doesn't work, we get the bug reports (and maintainability burden) instead of upstream tensorflow/onnx. Tract's statement on tensorflow 2 support is basically:

Addiotionaly, the complexity of TensorFlow 2 make it very unlikely that a direct support will ever exist in tract. Many TensorFlow 2 nets can be converted to ONNX and loaded in tract.

So we'd be going backwards on the actual user facing features (tf 1.0, not complete onnx feature set etc..)that we support this way.

That being said, tract could make a good starting point for us to try out wasi-nn. Especially if we want to target microcontroller world (librunecoral is a no-go for that). Eventually I'd like even librunecoral to support wasi-nn, but let's see how much time / resources we can allocate for that. We still have to kill the old C++ based RuneVM.

Personally, as long as we get zero copy pipelines, and be able to use appropriate hardware acceleration for the use cases that we wish support (eg. if we want to use rune for some kind of video processing - we need tpu/gpu acceleration there, but for just text/audio based models, we can get away without using hardware acceleration), we can get away with any framework.

@f0rodo f0rodo added the area - model inference ML Model support label Nov 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area - model inference ML Model support area - runtime The Rust Rune runtime category - enhancement New feature or request priority - on-demand This won't be touched until there is an external need for it (i.e. required by a customer)
Projects
None yet
Development

No branches or pull requests

4 participants