You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@AshStuff dupe of #761 I have not used the grad accum myself, it would be worth someone with the resources doing a small scale experiment to run w/ and w/o scaling and see if there is any difference in behviour... though I would have thought that would have been tested against normal training before being added in first place. Perhaps not.
In this line
open_clip/src/open_clip_train/train.py
Line 162 in fc5a37b
We are trying to accumulate the gradients and perform optimizer step only after we accumulate gradients for
accum_freq
steps.I am wondering whether do we need to divide the
total_loss
byaccum_freq
to scale the loss properly.The text was updated successfully, but these errors were encountered: