Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX Runtime TensorRT #32999

Open
BohdanBilonoh opened this issue Dec 5, 2024 · 1 comment
Open

ONNX Runtime TensorRT #32999

BohdanBilonoh opened this issue Dec 5, 2024 · 1 comment

Comments

@BohdanBilonoh
Copy link

BohdanBilonoh commented Dec 5, 2024

Describe the solution you'd like
Vespa supports only CPU or CUDA execution providers but Nvidia GPUs with tensor cores could theoretically give x2 inference performance with TensorRT execution providers.

Describe alternatives you've considered
Instead of checking for a GPU and automatically assigning CUDA if a GPU is present, implement a configurable execution provider with specific options. This approach can potentially be extended to support other backends

Additional context
TensorRT has time consuming initialisation and have to be warmed up in tricky way to avoid latency issues

@BohdanBilonoh
Copy link
Author

BohdanBilonoh commented Dec 5, 2024

I am at the finish line with TensorRT inference performance tests. I will attach them here as soon it well be ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants