You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm new to OnnxRuntime and trying to increase performance by running 2 GPU sessions in parallel. I'd assume I can do it as simple as:
create 2 sessions and 2 cuda streams.
assign 1st stream to 1st session by OrtCUDAProviderOptions, same for the 2nd stream to the 2nd session
create 2 threads, each for one of the sessions above.
When running it, I do see that lots of the time the 2 GPU streams are running in parallel, however, from profiling the GPU perf, I'm seeing a lot of pthread_mutex_lock causing huge latency. So that now running 2 GPU streams in parallel is as slow as running them sequentially for me...
Below is a screenshot showing those pthread_mutex_lock causing the extra dependency(false dependency?).
1)Are those inserted by OnnxRuntime?
2) Is there any way to get rid of those locks if they are false dependency?
3) Is there any other way to achieve running multiple cuda sessions in parallel?
ep:CUDAissues related to the CUDA execution provider
1 participant
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I'm new to OnnxRuntime and trying to increase performance by running 2 GPU sessions in parallel. I'd assume I can do it as simple as:
When running it, I do see that lots of the time the 2 GPU streams are running in parallel, however, from profiling the GPU perf, I'm seeing a lot of pthread_mutex_lock causing huge latency. So that now running 2 GPU streams in parallel is as slow as running them sequentially for me...
Below is a screenshot showing those pthread_mutex_lock causing the extra dependency(false dependency?).
1)Are those inserted by OnnxRuntime?
2) Is there any way to get rid of those locks if they are false dependency?
3) Is there any other way to achieve running multiple cuda sessions in parallel?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions