You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We occasionally get large spikes in the training loss during training. For example:
The system ends up recovering, but it can take quite a long time. This may not be worth worrying about because the system does recover, but we should understand this.
One hypothesis is we get "bad batches" where the data is garbage, is thus high perplexity, and gives us bad gradients.
Another is instability related to fp16.
The text was updated successfully, but these errors were encountered:
We occasionally get large spikes in the training loss during training. For example:
The system ends up recovering, but it can take quite a long time. This may not be worth worrying about because the system does recover, but we should understand this.
One hypothesis is we get "bad batches" where the data is garbage, is thus high perplexity, and gives us bad gradients.
Another is instability related to fp16.
The text was updated successfully, but these errors were encountered: