Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2ch12 the memory explosion when i train the balanced model with the same config in p2ch11 #94

Open
icegomic opened this issue Jun 19, 2022 · 2 comments

Comments

@icegomic
Copy link

icegomic commented Jun 19, 2022

I use the same config epoch=1, work_num=8, batch_size=32
It works well in p2ch11, the RAM is stable at 6g
but when I run the code 'python -m p2ch12.training --balanced'
the RAM is very high, and exceeds maximum soon,
after that my computer didn't work, and I need to restart it.
What happened

@icegomic icegomic changed the title p2ch12 the memory explosion when i train the balanced model with the same config in p2ch11 Jun 19, 2022
@Va6lue
Copy link

Va6lue commented Mar 18, 2023

I also encounter the memory explosion issue in p2ch11 when doing validation.
My memory size is 32GB.
I have no idea what happened.

@donnoc
Copy link

donnoc commented Feb 4, 2024

I have the same issue with the p2ch11 Code. While validating, the memory size explodes.
I am also very interested in what causes this ... or why the DataLoader sometimes uses the GPU memory efficiently and sometimes floods the computer memory (RAM?!) beforehand.

Side note: After I had programmed my own architecture of the code for practicing, my memory explodes for the training and validation. I have to tweak the worker and batch size for a good run.

edit maybe this will help: https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multiprocess-DataLoader/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants