-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long training time #8
Comments
I am using the RefineDet detection model, training it on an RTX 3080 GPU with a batch size of 12. The only difference is the choice of the backbone network. |
Hello, our SMT-T model should have a higher throughput speed than CrossFormer-S (833 vs. 672 images/s). Theoretically, the training time should not differ significantly. One possible reason is that depth-wise convolution is not very GPU-friendly during training, and our MHMC utilizes multiple depth-wise convolutions with relatively large kernels and channel shuffling, which may impact the actual training speed. Can you provide specific training time difference values? |
crossformer smt |
The impact on training speed is indeed a drawback of this model. Thank you for bringing it to our attention. If longer training times are not feasible for you, we suggest either reducing the number of heads in MHMC or decreasing the number of blocks in the first two stages to achieve higher efficiency. |
Thank you for your suggestion |
Why did the training time for 10 batches increase several times when replacing the backbone network from CrossFormer-S to SMT-T, despite SMT-T having much fewer parameters and computations than CrossFormer-S?
The text was updated successfully, but these errors were encountered: