You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear Author, I am truly grateful for your outstanding work. Please allow me to raise a small question regarding the memory of gradient:
As I understand it, the LOMO method can only ensure that gradients are updated layer-by-layer, but the gradient memory for each weight matrix is not compressed. The shape size remains consistent with the original weight.
I'm not sure if I'm misusing it.
The text was updated successfully, but these errors were encountered:
That's correct. LOMO does not directly compress gradient. GaLore should be able to compress gradient to reduce its memory (less memory requirement if we disable LOMO and enable gradient accumulation). We will include it in our next version.
Dear Author, I am truly grateful for your outstanding work. Please allow me to raise a small question regarding the memory of gradient:
As I understand it, the LOMO method can only ensure that gradients are updated layer-by-layer, but the gradient memory for each weight matrix is not compressed. The shape size remains consistent with the original weight.
I'm not sure if I'm misusing it.
The text was updated successfully, but these errors were encountered: