Replies: 7 comments
-
What's the use case for releasing the model memory? Trying to delete the optimizer object might help with releasing the optimizer memory (so something like |
Beta Was this translation helpful? Give feedback.
-
I have a use case to call the training function twice, but I found some memory fragment that cannot release by del and torch.cuda.empty_cache() after the first dumpy training. @deepakn94 |
Beta Was this translation helpful? Give feedback.
-
I see, makes sense. You probably don't want to release model memory still. Presumably you are not doing a backward pass the first time you run the training function? You could try activation recomputation. |
Beta Was this translation helpful? Give feedback.
-
Actually, the use case is required to change the original neural network structure and that's why I want to release the model and optimizer memory from the original one. I tried recomputation to release activation in the dummy training already. But model and optimizer should occupy more memory, I think? I have clear model parameters and optimizer state by setting tensor.storage.resize_(0). It did drop some of the memory fragment but not that much. @deepakn94 Thanks for your quick answer in advance! |
Beta Was this translation helpful? Give feedback.
-
Hmm, not sure. |
Beta Was this translation helpful? Give feedback.
-
@deepakn94 One more question, I found I trained the model in the second time after the dummy first training, its loss curve is different from that one in the training from scratch. Do you know why? Should I destroy the process group and reinitialize it? |
Beta Was this translation helpful? Give feedback.
-
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
-
Your question
How to release the model and optimizer memory manually?
What I have tried
But the ways above are not worked, please help, thanks!
Beta Was this translation helpful? Give feedback.
All reactions