Does galore save gradient memory? #53

jinqixiao · 2024-06-17T20:30:41Z

Dear Author, I am truly grateful for your outstanding work. Please allow me to raise a small question regarding the memory of gradient:
As I understand it, the LOMO method can only ensure that gradients are updated layer-by-layer, but the gradient memory for each weight matrix is not compressed. The shape size remains consistent with the original weight.
I'm not sure if I'm misusing it.

jiaweizzhao · 2024-06-30T22:51:31Z

That's correct. LOMO does not directly compress gradient. GaLore should be able to compress gradient to reduce its memory (less memory requirement if we disable LOMO and enable gradient accumulation). We will include it in our next version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does galore save gradient memory? #53

Does galore save gradient memory? #53

jinqixiao commented Jun 17, 2024

jiaweizzhao commented Jun 30, 2024

Does galore save gradient memory? #53

Does galore save gradient memory? #53

Comments

jinqixiao commented Jun 17, 2024

jiaweizzhao commented Jun 30, 2024