You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
We recently updated examples in the Morpheus project from using Triton Server 23.06 to 24.09. These example use automatic ORT-TRT optimization but we now get errors when running on multiple GPUs. Everything works as expected on single GPU. We can also get it to work on multi-GPU if we remove the ORT-TRT optimization from the config.pbtxt. This is the Morpheus issue for that.
Errors can also be reproduced using the Triton densenet_onnx example model by updating its config.pbtxt to use ORT-TRT optimization and running on multi-GPU.
This appears to be an issue only with the automatic ORT-TRT optimization within Triton. Errors are not seen after deploying a TRT engine (model.plan) that I manually converted from ONNX.
Triton Information
What version of Triton are you using? 24.09 but also get error with 24.11
Are you using the Triton container or did you build it yourself? Triton container
To Reproduce
Follow steps in Quickstart. Before running Triton, update config.pbtxt for densenet_onnx to use ORT-TRT optimization by adding the following to the end of the file:
Description
We recently updated examples in the Morpheus project from using Triton Server 23.06 to 24.09. These example use automatic ORT-TRT optimization but we now get errors when running on multiple GPUs. Everything works as expected on single GPU. We can also get it to work on multi-GPU if we remove the ORT-TRT optimization from the
config.pbtxt
. This is the Morpheus issue for that.Errors can also be reproduced using the Triton densenet_onnx example model by updating its
config.pbtxt
to use ORT-TRT optimization and running on multi-GPU.This appears to be an issue only with the automatic ORT-TRT optimization within Triton. Errors are not seen after deploying a TRT engine (model.plan) that I manually converted from ONNX.
Triton Information
What version of Triton are you using?
24.09 but also get error with 24.11
Are you using the Triton container or did you build it yourself?
Triton container
To Reproduce
Follow steps in Quickstart. Before running Triton, update
config.pbtxt
fordensenet_onnx
to use ORT-TRT optimization by adding the following to the end of the file:Also, use
--gpus=all
to run Triton:On my machine with two GPUs (Quadro RTX 8000), I see error in Triton logs with every other inference request. After four inference requests:
Expected behavior
No errors with ORT-TRT optimization on multi-GPU.
The text was updated successfully, but these errors were encountered: