Question: Why multi-threading in inference code? #3

KeondoPark · 2023-10-06T03:00:43Z

Hi, Congratulations on 1st rank on LPCV challenges.
I reviewed your solution, and noticed that you guys used multi-threading when post-processing the prediction results in the inference code.
I guess this wouldn't have any effect on the inference time, because the time is only measured for model inference, but maybe I am wrong.
If you have any specific reason for using multi-threading, could you please explain?

Thanks
Keondo

modricwang · 2023-10-13T05:08:37Z

Thank you for your question, it's very valuable.

When waiting for file I/O operations, the GPU may become idle, leading to throttling. If asynchronous file I/O is performed, it can ensure that the GPU stays busy at all times, preventing throttling and ultimately maintaining higher GPU inference speeds.

KeondoPark · 2023-10-13T17:39:20Z

@modricwang Thank you for the response!!
I have a couple of followup questions:

To my knowledge, GPU is throttled when the workload is heavy, but is it also throttled when it is idle?
When I tested your inference ccode, another major speed-up comes from using batches. But when I tried to use the batch in my code, it did not necessarily lead to speed-up. Did you use any special tricks to make batch inference really increase throughput??

Thank you very much.
Keondo

modricwang · 2023-10-17T08:40:42Z

Hi @KeondoPark ,

Thanks for your follow-up questions!

For Question 1:
Throttling is indeed typically a scenario under high computational load. However, in our case, we're dealing with a situation where non-GPU computations, such as pre-processing and post-processing, take up too much time, causing the GPU to enter an idle state. At this point, the frequency would automatically decrease and go into power-saving mode, like P8, for instance. This phenomenon can be observed using nvidia-smi or jtop.

For Question 2:
Similar to the situation described in the previous question, you could try observing the GPU's frequency. If the GPU frequency cannot maintain a high level, it indicates that non-GPU computations are taking up a considerable amount of time. As a result, the GPU frequency might not have had a chance to increase before the inference process ends. It's worth noting that this will be more apparent for smaller models like those in the LPCV2023. For tasks with larger computational demands, such as LLM inference, this phenomenon won't be as noticeable.

I hope this provides some clarity. If you have any more questions, please feel free to ask.

KeondoPark · 2023-10-20T08:18:09Z

Thank you for your valuable response.
We used jtop to monitor GPU utilization on NVIDIA Jetson Nano, but we noticed the update was deterred during inference, and could not track the GPU status properly. Did you use any tools to monitor GPU frequency? We also tried profiling with nvprof, but it was not very helpful.

modricwang · 2023-10-23T06:30:12Z

This problem is not common, we can normally update using jtop. I suspect it might be due to insufficient RAM on the Nano, we are using a 4GB version of the board. Perhaps you could try increasing the swap space?

KeondoPark · 2023-10-23T07:22:57Z

Aha, we tested on Jetson Nano 2GB, which is the recommended platform by the organizer. Yes I agree with you that this is associated with insufficient RAM. Maybe I need to test on Jeston 4GB as well.

Thank you for your kind responses. I could learn a lot from this discussion.
Congratulations agian on winning the challenges!

modricwang · 2023-10-23T08:20:58Z

Happy to help. If you need more communication, feel free to open this issue again~

modricwang closed this as completed Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Why multi-threading in inference code? #3

Question: Why multi-threading in inference code? #3

KeondoPark commented Oct 6, 2023

modricwang commented Oct 13, 2023

KeondoPark commented Oct 13, 2023

modricwang commented Oct 17, 2023

KeondoPark commented Oct 20, 2023

modricwang commented Oct 23, 2023

KeondoPark commented Oct 23, 2023

modricwang commented Oct 23, 2023

Question: Why multi-threading in inference code? #3

Question: Why multi-threading in inference code? #3

Comments

KeondoPark commented Oct 6, 2023

modricwang commented Oct 13, 2023

KeondoPark commented Oct 13, 2023

modricwang commented Oct 17, 2023

KeondoPark commented Oct 20, 2023

modricwang commented Oct 23, 2023

KeondoPark commented Oct 23, 2023

modricwang commented Oct 23, 2023