Why is model_run spending a lot of time after all operations have finished. #8035
-
I've been profiling a network I'm trying to run with the highest throughput I get can for inference. It's based on the AlphaZero network, which for inference means only I'm calling The profile I get with And zoomed in on the second This mostly makes sense, there's some session initialization, then the first It can't be the memcopy back to CPU memory, since the output is 5x smaller then the input and the input doesn't seem to be taking a lot of time. Is this just a bug in the profiler or is the empty space time that is spent actually doing nothing? All the relevant version info I can think of:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
The profiler currently only works reliably on CPU profiling. The non-execute region in model_run actually is filled with CUDA kernel runs. |
Beta Was this translation helpful? Give feedback.
-
I found out that it happens due to d2h operator that happens if you run the model.Run without io_bindings. What you can do is disable that sync by applying kOrtRunOptionsConfigDisableSynchronizeExecutionProviders. That can be handy is some situations. For example, if you have a lock around "model_run" and want to maximize the GPU utilization. |
Beta Was this translation helpful? Give feedback.
The profiler currently only works reliably on CPU profiling. The non-execute region in model_run actually is filled with CUDA kernel runs.