Wrapping the loss with DistributedDataParallel #710

elisim · 2024-08-13T14:58:50Z

Hi,

When using DDP, should the loss_fn be wrapped with DistributedDataParallel? I’m specifically working with CosFace and ArcFace, both of which have a W parameter in the loss function. To ensure the gradients of W are synchronized across all processes, is it necessary to wrap loss_fn with DistributedDataParallel?

I saw it here:

#218

KevinMusgrave · 2024-08-13T19:16:15Z

Yeah I think it's necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrapping the loss with DistributedDataParallel #710

Wrapping the loss with DistributedDataParallel #710

elisim commented Aug 13, 2024 •

edited

Loading

KevinMusgrave commented Aug 13, 2024

Wrapping the loss with DistributedDataParallel #710

Wrapping the loss with DistributedDataParallel #710

Comments

elisim commented Aug 13, 2024 • edited Loading

KevinMusgrave commented Aug 13, 2024

elisim commented Aug 13, 2024 •

edited

Loading