You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the solution you'd like
Vespa supports only CPU or CUDA execution providers but Nvidia GPUs with tensor cores could theoretically give x2 inference performance with TensorRT execution providers.
Describe alternatives you've considered
Instead of checking for a GPU and automatically assigning CUDA if a GPU is present, implement a configurable execution provider with specific options. This approach can potentially be extended to support other backends
Additional context
TensorRT has time consuming initialisation and have to be warmed up in tricky way to avoid latency issues
The text was updated successfully, but these errors were encountered:
Describe the solution you'd like
Vespa supports only CPU or CUDA execution providers but Nvidia GPUs with tensor cores could theoretically give x2 inference performance with TensorRT execution providers.
Describe alternatives you've considered
Instead of checking for a GPU and automatically assigning CUDA if a GPU is present, implement a configurable execution provider with specific options. This approach can potentially be extended to support other backends
Additional context
TensorRT has time consuming initialisation and have to be warmed up in tricky way to avoid latency issues
The text was updated successfully, but these errors were encountered: