Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Why multi-threading in inference code? #3

Closed
KeondoPark opened this issue Oct 6, 2023 · 7 comments
Closed

Question: Why multi-threading in inference code? #3

KeondoPark opened this issue Oct 6, 2023 · 7 comments

Comments

@KeondoPark
Copy link

Hi, Congratulations on 1st rank on LPCV challenges.
I reviewed your solution, and noticed that you guys used multi-threading when post-processing the prediction results in the inference code.
I guess this wouldn't have any effect on the inference time, because the time is only measured for model inference, but maybe I am wrong.
If you have any specific reason for using multi-threading, could you please explain?

Thanks
Keondo

@modricwang
Copy link
Collaborator

Thank you for your question, it's very valuable.

When waiting for file I/O operations, the GPU may become idle, leading to throttling. If asynchronous file I/O is performed, it can ensure that the GPU stays busy at all times, preventing throttling and ultimately maintaining higher GPU inference speeds.

@KeondoPark
Copy link
Author

@modricwang Thank you for the response!!
I have a couple of followup questions:

  1. To my knowledge, GPU is throttled when the workload is heavy, but is it also throttled when it is idle?
  2. When I tested your inference ccode, another major speed-up comes from using batches. But when I tried to use the batch in my code, it did not necessarily lead to speed-up. Did you use any special tricks to make batch inference really increase throughput??

Thank you very much.
Keondo

@modricwang
Copy link
Collaborator

Hi @KeondoPark ,

Thanks for your follow-up questions!

For Question 1:
Throttling is indeed typically a scenario under high computational load. However, in our case, we're dealing with a situation where non-GPU computations, such as pre-processing and post-processing, take up too much time, causing the GPU to enter an idle state. At this point, the frequency would automatically decrease and go into power-saving mode, like P8, for instance. This phenomenon can be observed using nvidia-smi or jtop.

For Question 2:
Similar to the situation described in the previous question, you could try observing the GPU's frequency. If the GPU frequency cannot maintain a high level, it indicates that non-GPU computations are taking up a considerable amount of time. As a result, the GPU frequency might not have had a chance to increase before the inference process ends. It's worth noting that this will be more apparent for smaller models like those in the LPCV2023. For tasks with larger computational demands, such as LLM inference, this phenomenon won't be as noticeable.

I hope this provides some clarity. If you have any more questions, please feel free to ask.

@KeondoPark
Copy link
Author

Thank you for your valuable response.
We used jtop to monitor GPU utilization on NVIDIA Jetson Nano, but we noticed the update was deterred during inference, and could not track the GPU status properly. Did you use any tools to monitor GPU frequency? We also tried profiling with nvprof, but it was not very helpful.

@modricwang
Copy link
Collaborator

This problem is not common, we can normally update using jtop. I suspect it might be due to insufficient RAM on the Nano, we are using a 4GB version of the board. Perhaps you could try increasing the swap space?

@KeondoPark
Copy link
Author

Aha, we tested on Jetson Nano 2GB, which is the recommended platform by the organizer. Yes I agree with you that this is associated with insufficient RAM. Maybe I need to test on Jeston 4GB as well.

Thank you for your kind responses. I could learn a lot from this discussion.
Congratulations agian on winning the challenges!

@modricwang
Copy link
Collaborator

Happy to help. If you need more communication, feel free to open this issue again~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants