Can't improve the scaler of batch size with ZeRO technique #1884
Unanswered
Lyn-Lucy
asked this question in
Community | Q&A
Replies: 1 comment
-
Hi, ZeRO has own AMP. DO NOT use autocast and grad scaler. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Here is my code of training refer to the example of zero in [ColossalAI-Examples/train_v2.py at main · hpcaitech/ColossalAI-Examples (github.com)](https://github.com/hpcaitech/ColossalAI-Examples/blob/main/features/zero/train_v2.py)
However,I can only set one batch to per GPU. I can set 4 batch to per GPU without ZeRO.
Here is the result:
And the same time ,I also want to ask why that the parameters don't change after step.
Beta Was this translation helpful? Give feedback.
All reactions