-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recalculating the activations in the backwards pass to conserve memory #478
Comments
To start off, I will first implement the layernorm forward in the backwards pass implementation and use the ln1 and ln2 values directly from that layernorm forward to get an initial working version of recalculating the values in the backwards pass. |
In the above PR I was able to implement the reduced memory:
To this:
|
The PR was merged but still needs the second step of making a simplified kernel that doesnt recompute everything and reuses the values calculated in the forwards pass |
@ngc92 Did an analysis of the areas that take up the most memory and its impact on the amount of batches that can be used and found that one of the largest contributors was the memory associated with the Layernorm recomputations:
This Issue will track the implementation of adding the ability similar to how the GELU is recalculated in the backwards pass to recalculate the layernorm forwards activations so that we can reduce the memory.
The text was updated successfully, but these errors were encountered: