You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't understand the implementation of FSDP. I think in order to use FSDP, the model needs to be wrapped by FullyShardedDataParallel (or FSDP) imported from torch.distributed.fsdp. But I can not find anywhere in this repo uses these two keywords.
So I don't think FSDP is implemented natively, but rather, perhaps, some combination to zero2&3 from Deepspeed?. I am missing something?
In fact, some other issues is related to this question. Basically the model is not shared, but each GPU holds one full copy of the model. For example issue 51404912.
Thank you very much in advance for giving some hints.
The text was updated successfully, but these errors were encountered:
I don't understand the implementation of FSDP. I think in order to use FSDP, the model needs to be wrapped by FullyShardedDataParallel (or FSDP) imported from torch.distributed.fsdp. But I can not find anywhere in this repo uses these two keywords.
So I don't think FSDP is implemented natively, but rather, perhaps, some combination to zero2&3 from Deepspeed?. I am missing something?
In fact, some other issues is related to this question. Basically the model is not shared, but each GPU holds one full copy of the model. For example issue 5140 4912.
Thank you very much in advance for giving some hints.
The text was updated successfully, but these errors were encountered: