ONNX Runtime TensorRT #32999

BohdanBilonoh · 2024-12-05T11:28:39Z

Describe the solution you'd like
Vespa supports only CPU or CUDA execution providers but Nvidia GPUs with tensor cores could theoretically give x2 inference performance with TensorRT execution providers.

Describe alternatives you've considered
Instead of checking for a GPU and automatically assigning CUDA if a GPU is present, implement a configurable execution provider with specific options. This approach can potentially be extended to support other backends

Additional context
TensorRT has time consuming initialisation and have to be warmed up in tricky way to avoid latency issues

BohdanBilonoh · 2024-12-05T11:31:19Z

I am at the finish line with TensorRT inference performance tests. I will attach them here as soon it well be ready

bjormel added the enhancement label Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX Runtime TensorRT #32999

ONNX Runtime TensorRT #32999

BohdanBilonoh commented Dec 5, 2024 •

edited

Loading

BohdanBilonoh commented Dec 5, 2024 •

edited

Loading

ONNX Runtime TensorRT #32999

ONNX Runtime TensorRT #32999

Comments

BohdanBilonoh commented Dec 5, 2024 • edited Loading

BohdanBilonoh commented Dec 5, 2024 • edited Loading

BohdanBilonoh commented Dec 5, 2024 •

edited

Loading

BohdanBilonoh commented Dec 5, 2024 •

edited

Loading