You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to use my second-order optimizer ESGD-M with BatchL2Grad in order to collect information on within-batch gradient variance to estimate stochastic noise (think OpenAI's gradient noise scale paper), and I kept OOMing after maybe six epochs of MNIST training. ESGD-M does a Hessian-vector product internally (not using Backpack stuff, just autograd) so it needs the user to specify create_graph=True. I assume when I use it with Backpack, something is leaking references to past computational graphs, normally these graphs are garbage collected without issue.
Thank you,
Katherine Crowson
The text was updated successfully, but these errors were encountered:
Apparently if I tell ESGD-M to do a Hessian-vector product every step instead of every ten for compute efficiency, I don't OOM anymore. Normally the graphs made with create_graph=True are freed on their own if ESGD-M doesn't do an HVP that step but Backpack is hanging onto them somewhere?
I was trying to use my second-order optimizer ESGD-M with BatchL2Grad in order to collect information on within-batch gradient variance to estimate stochastic noise (think OpenAI's gradient noise scale paper), and I kept OOMing after maybe six epochs of MNIST training. ESGD-M does a Hessian-vector product internally (not using Backpack stuff, just autograd) so it needs the user to specify create_graph=True. I assume when I use it with Backpack, something is leaking references to past computational graphs, normally these graphs are garbage collected without issue.
Thank you,
Katherine Crowson
The text was updated successfully, but these errors were encountered: