Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long training time #8

Open
mc-lgt opened this issue Jul 20, 2023 · 5 comments
Open

Long training time #8

mc-lgt opened this issue Jul 20, 2023 · 5 comments

Comments

@mc-lgt
Copy link

mc-lgt commented Jul 20, 2023

Why did the training time for 10 batches increase several times when replacing the backbone network from CrossFormer-S to SMT-T, despite SMT-T having much fewer parameters and computations than CrossFormer-S?

@mc-lgt
Copy link
Author

mc-lgt commented Jul 20, 2023

I am using the RefineDet detection model, training it on an RTX 3080 GPU with a batch size of 12. The only difference is the choice of the backbone network.

@AFeng-x
Copy link
Owner

AFeng-x commented Jul 20, 2023

Hello, our SMT-T model should have a higher throughput speed than CrossFormer-S (833 vs. 672 images/s). Theoretically, the training time should not differ significantly. One possible reason is that depth-wise convolution is not very GPU-friendly during training, and our MHMC utilizes multiple depth-wise convolutions with relatively large kernels and channel shuffling, which may impact the actual training speed. Can you provide specific training time difference values?

@mc-lgt
Copy link
Author

mc-lgt commented Jul 20, 2023

crossformer
iter 0 || ARM Loss Loc: 19.0710 || ARM Loss Conf: 70.5542 || ODM Loss Loc: 3.2157 || ODM Loss Conf: 15.5737 || Loss: 108.4146 || lr: 0.000000
timer: 0.2559 sec, data loading timer: 0.0020 sec
iter 10 || ARM Loss Loc: 10.8647 || ARM Loss Conf: 24.8945 || ODM Loss Loc: 4.8043 || ODM Loss Conf: 12.8692 || Loss: 53.4327 || lr: 0.000003
timer: 0.2504 sec, data loading timer: 0.0015 sec
iter 20 || ARM Loss Loc: 7.2353 || ARM Loss Conf: 18.3482 || ODM Loss Loc: 4.7534 || ODM Loss Conf: 10.8765 || Loss: 41.2135 || lr: 0.000006
timer: 0.2539 sec, data loading timer: 0.0020 sec
iter 30 || ARM Loss Loc: 6.4559 || ARM Loss Conf: 16.8169 || ODM Loss Loc: 3.5650 || ODM Loss Conf: 9.4054 || Loss: 36.2432 || lr: 0.000010
timer: 0.2539 sec, data loading timer: 0.0021 sec
iter 40 || ARM Loss Loc: 6.4502 || ARM Loss Conf: 14.6387 || ODM Loss Loc: 2.9532 || ODM Loss Conf: 8.3290 || Loss: 32.3710 || lr: 0.000013
timer: 0.2466 sec, data loading timer: 0.0014 sec
iter 50 || ARM Loss Loc: 6.1432 || ARM Loss Conf: 13.7867 || ODM Loss Loc: 3.3690 || ODM Loss Conf: 7.2396 || Loss: 30.5384 || lr: 0.000016

smt
iter 0 || ARM Loss Loc: 3.9868 || ARM Loss Conf: 12.8144 || ODM Loss Loc: 3.6057 || ODM Loss Conf: 17.3007 || Loss: 37.7077 || lr: 0.000000
timer: 1.2552 sec, data loading timer: 0.0020 sec
iter 10 || ARM Loss Loc: 4.2079 || ARM Loss Conf: 11.4341 || ODM Loss Loc: 3.2977 || ODM Loss Conf: 13.8518 || Loss: 32.7916 || lr: 0.000003
timer: 1.2874 sec, data loading timer: 0.0015 sec
iter 20 || ARM Loss Loc: 3.6346 || ARM Loss Conf: 10.7321 || ODM Loss Loc: 3.5200 || ODM Loss Conf: 12.5814 || Loss: 30.4682 || lr: 0.000006
timer: 1.2495 sec, data loading timer: 0.0014 sec
iter 30 || ARM Loss Loc: 3.3966 || ARM Loss Conf: 8.5784 || ODM Loss Loc: 3.2811 || ODM Loss Conf: 11.1124 || Loss: 26.3686 || lr: 0.000010
timer: 1.2506 sec, data loading timer: 0.0019 sec

@AFeng-x
Copy link
Owner

AFeng-x commented Jul 20, 2023

The impact on training speed is indeed a drawback of this model. Thank you for bringing it to our attention. If longer training times are not feasible for you, we suggest either reducing the number of heads in MHMC or decreasing the number of blocks in the first two stages to achieve higher efficiency.

@mc-lgt
Copy link
Author

mc-lgt commented Jul 20, 2023

Thank you for your suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants