Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with HF datasets has a severe memory leak #30

Open
Vectorrent opened this issue Sep 8, 2024 · 0 comments
Open

Training with HF datasets has a severe memory leak #30

Vectorrent opened this issue Sep 8, 2024 · 0 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Vectorrent
Copy link
Contributor

I have been training with AIGen in Kaggle notebooks, and I'm running into an issue where CPU memory is slowly increasing, over the course of several hours. Before long, the notebook goes OOM, and training crashes.

I'm not sure where the leak is happening. I do know that it's not in VTX (it's in AIGen), and it's not leaking VRAM (it leaks system RAM). I suspect it has something to do with the streaming dataloaders (because they are the only ones I'm using here), but I haven't had the bandwidth to troubleshoot yet.

@Vectorrent Vectorrent added bug Something isn't working help wanted Extra attention is needed labels Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant