Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM eventually when using create_graph=True with BatchL2Grad #238

Open
crowsonkb opened this issue Jan 24, 2022 · 2 comments
Open

OOM eventually when using create_graph=True with BatchL2Grad #238

crowsonkb opened this issue Jan 24, 2022 · 2 comments

Comments

@crowsonkb
Copy link

I was trying to use my second-order optimizer ESGD-M with BatchL2Grad in order to collect information on within-batch gradient variance to estimate stochastic noise (think OpenAI's gradient noise scale paper), and I kept OOMing after maybe six epochs of MNIST training. ESGD-M does a Hessian-vector product internally (not using Backpack stuff, just autograd) so it needs the user to specify create_graph=True. I assume when I use it with Backpack, something is leaking references to past computational graphs, normally these graphs are garbage collected without issue.

Thank you,
Katherine Crowson

@crowsonkb
Copy link
Author

Apparently if I tell ESGD-M to do a Hessian-vector product every step instead of every ten for compute efficiency, I don't OOM anymore. Normally the graphs made with create_graph=True are freed on their own if ESGD-M doesn't do an HVP that step but Backpack is hanging onto them somewhere?

@f-dangel
Copy link
Owner

Hi,

thanks for your report. From your description I think BackPACK's memory cleanup should be triggered during the backward pass.

Maybe you can try to explicitly disable BackPACK's hooks during the HVP using

from backpack import disable

with disable():
   # HVP

Otherwise, it would be great to reproduce this issue in an MWE and track down the memory leak.

Best,
Felix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants