Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem of distributed training #36

Open
eagles1812 opened this issue Sep 22, 2022 · 2 comments
Open

Problem of distributed training #36

eagles1812 opened this issue Sep 22, 2022 · 2 comments

Comments

@eagles1812
Copy link

Thanks for the great paper, dataset and code!

I tried to train the model with ready data using single GPU, it took roughly half day. So I tried to add some distributed training component, the training time decreased, but also the AP/AR/IOU values. Have you tested distributed training? How do you correctly set the parameters to ensure shorter training time and proper AP/AR/IOU values?

Thank you!

@jrebut
Copy link
Contributor

jrebut commented Sep 29, 2022

Hi, can you please explain what you means by distributed training component?

Julien

@eagles1812
Copy link
Author

Thanks for your reply. In your code suite, you used one GPU for training, for a large dataset such as yours, it takes very long time. I have multiple GPUs, and wanted to decrease the training time, so I modified your training code to include distributed training component following articles such as https://towardsdatascience.com/how-to-scale-training-on-multiple-gpus-dae1041f49d2, then I met the problem I listed in the previous post. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants