-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the speed of TRT #22
Comments
In object detection, the image input size is much larger than 224 on ImageNet, you can try FP16 or lower down the input resolution, or choose some platform more suitable for our vanilla (e.g., A100) to narrow the gap between vanilla-9 and Res34. |
Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128128, which is smaller than 224224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT? |
Thank you for sharing your observations regarding the speed performance of VanillaNet when tested with TensorRT. It's important to note that changing the input resolution of the model can lead to variations in speed performance. We recommend conducting tests using an input image size of 224*224, as per the model's original design and the conditions under which it was benchmarked in our paper. If you're using an input size of 128*128, the model might require a redesign to optimize its performance for that specific resolution. The discrepancy you observed, especially when comparing VanillaNet6 with MobileNetV3-Large under TensorRT, might be attributed to this change in input size, which can significantly affect how the model processes data and, consequently, its inference speed. |
|
Thank you for sharing your further testing results and observations. VanillaNet is designed with fewer layers, but each layer involves more complex computations. Therefore, it's better suited for scenarios where there's ample computational resources. In such cases, the primary latency bottleneck tends to be the number of layers rather than FLOPs, which is a key point we aimed to highlight in our work. We generally set the batch size to 1 for our tests because when the batch size is larger, VanillaNet may not exhibit the advantages as seen in your tests. I suggest you try setting the batch size to 1 when running tests in TensorRT as well. This might better reflect the performance characteristics described in the paper. |
Thank you. I see. VanillaNet is a very interesting job. May I ask whether VanillaNet will continue to develop in the future, such as VanillaNetV2 or VanillaNetplus, which can maintain its advantage even when batchsize>1? That will be exciting. Because batchsize>1 is often used in practical application scenarios, higher throughput can be obtained. |
Hi, thanks for the great work. I have tried to apply vanilla-9 to object detection, however, when transferring the model to TensorRT, it seems much slower than ResNet-34. Is there any guidance? Thanks in advance.
The text was updated successfully, but these errors were encountered: