Adding support for a custom inference service #905

andrewsi-z · 2020-07-16T12:19:13Z

andrewsi-z
Jul 16, 2020

Hello!

We are in the very early stages of exploring porting bentoML for use on our platform.

One of the requirements that we would have is to add support for a custom inference backend (based on ONNX-MLIR). Is there any guidance that outlines the impacts for adding support for a new inference service (I did look, but didn't quite find anything)? If so, it would help speed up our exploration efforts... (If no such reference exists and we go forward, we will be happy to write something up and contribute).

Thank you

parano · 2020-07-16T22:38:59Z

parano
Jul 16, 2020
Maintainer

Hi @andrewsi-z, currently, there isn't a document on how to add a new custom inference backend, and we would really love to add a more generic API for users to integrate BentoML with their custom inference backend.

For inference backends that provides a Python API, it is relatively easy to do via the BentoArtifact abstraction. But for inference backend that runs as a separate service, I think the biggest blocker to support it right now is we need to add a service startup callback hook in BentoML that allows the user to launch their model service backend processes.

However, if your custom inference backend is deployed separately(running in a separate container or another machine), you can always invoke an RPC/HTTP call from your BentoService API callback function to access this custom inference backend. And it will still allow your workload to benefit from the micro-batching framework that the BentoML API server provides.

If you are unsure about your approach, I'd be happy to take a look at your design and discuss it here or over a call. This is a super interesting use case that I think many others in the community would be interested in. Any contributions around it are definitely welcomed!

1 reply

andrewsi-z Jul 21, 2020
Author

Hi @parano , thank you for your response!
This is all good to understand. I am going to first explore the port and then will look at connecting to the inference service. I'll definitely come back at that point to discuss further once I have my arms around the picture and can hopefully contribute back what we find.

andrewsi-z · 2021-03-12T21:49:14Z

andrewsi-z
Mar 12, 2021
Author

@parano I am revisiting this as we are interested in getting bentoML working with an ONNX-MLIR based backend.

ONNX-MLIR is a compiler that works with LLVM to generate a .so library (which uses pybind with a python based front end) based on an input ONNX model.

I have an early implementation that I am starting to test with here.

I am curious if the project would be willing to accept a custom artifact to support an ONNX-MLIR .so and what thoughts you might have around this. This is obviously a bit of a different approach than a full runtime, but has some very interesting capability and is quite streamlined in what it produces.

3 replies

parano Mar 14, 2021
Maintainer

@andrewsi-z we definitely welcome contributions like this! Feel free to open a PR with your branch and we will help review. The early implementation looks great! Let me know if you need help with adding CI tests, this PR might be a good reference, it adds CI tests for the transformer support: #1094

andrewsi-z Mar 15, 2021
Author

Thanks for the reference @parano ! I'm going to do a bit more testing and then I'll open the PR

andrewsi-z Mar 30, 2021
Author

added a draft PR here: #1545

I am thinking through how best to include integration tests still....

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

Adding support for a custom inference service #905

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

BentoML

Adding support for a custom inference service #905

andrewsi-z Jul 16, 2020

Replies: 2 comments · 4 replies

parano Jul 16, 2020 Maintainer

andrewsi-z Jul 21, 2020 Author

andrewsi-z Mar 12, 2021 Author

parano Mar 14, 2021 Maintainer

andrewsi-z Mar 15, 2021 Author

andrewsi-z Mar 30, 2021 Author

andrewsi-z
Jul 16, 2020

Replies: 2 comments 4 replies

parano
Jul 16, 2020
Maintainer

andrewsi-z Jul 21, 2020
Author

andrewsi-z
Mar 12, 2021
Author

parano Mar 14, 2021
Maintainer

andrewsi-z Mar 15, 2021
Author

andrewsi-z Mar 30, 2021
Author