[QUESTION] How to release the model and optimizer memory manually? #1081

robotsp · 2024-01-15T13:35:33Z

robotsp
Jan 15, 2024

Your question
How to release the model and optimizer memory manually?

What I have tried

set zero_grad() method
set None method
del method
gc.collect()
torch.cuda.empty_cached()

But the ways above are not worked, please help, thanks!

deepakn94 · 2024-01-16T17:46:22Z

deepakn94
Jan 16, 2024
Maintainer

What's the use case for releasing the model memory? Trying to delete the optimizer object might help with releasing the optimizer memory (so something like del optimizer in megatron/training.py).

0 replies

robotsp · 2024-01-20T02:27:37Z

robotsp
Jan 20, 2024
Author

What's the use case for releasing the model memory? Trying to delete the optimizer object might help with releasing the optimizer memory (so something like del optimizer in megatron/training.py).

I have a use case to call the training function twice, but I found some memory fragment that cannot release by del and torch.cuda.empty_cache() after the first dumpy training. @deepakn94

0 replies

deepakn94 · 2024-01-21T19:56:26Z

deepakn94
Jan 21, 2024
Maintainer

I see, makes sense. You probably don't want to release model memory still. Presumably you are not doing a backward pass the first time you run the training function?

You could try activation recomputation.

0 replies

robotsp · 2024-01-22T01:16:44Z

robotsp
Jan 22, 2024
Author

Actually, the use case is required to change the original neural network structure and that's why I want to release the model and optimizer memory from the original one. I tried recomputation to release activation in the dummy training already. But model and optimizer should occupy more memory, I think?

I have clear model parameters and optimizer state by setting tensor.storage.resize_(0). It did drop some of the memory fragment but not that much.

@deepakn94 Thanks for your quick answer in advance!

0 replies

deepakn94 · 2024-01-26T19:26:23Z

deepakn94
Jan 26, 2024
Maintainer

Hmm, not sure. del model and del optimizer seem like the "right" things to do here.

0 replies

robotsp · 2024-02-02T13:31:03Z

robotsp
Feb 2, 2024
Author

@deepakn94 One more question, I found I trained the model in the second time after the dummy first training, its loss curve is different from that one in the training from scratch. Do you know why? Should I destroy the process group and reinitialize it?

0 replies

2024-04-02T18:20:22Z

github-actions[bot]
bot Apr 2, 2024

Marking as stale. No activity in 60 days.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] How to release the model and optimizer memory manually? #1081

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[QUESTION] How to release the model and optimizer memory manually? #1081

robotsp Jan 15, 2024

Replies: 7 comments

deepakn94 Jan 16, 2024 Maintainer

robotsp Jan 20, 2024 Author

deepakn94 Jan 21, 2024 Maintainer

robotsp Jan 22, 2024 Author

deepakn94 Jan 26, 2024 Maintainer

robotsp Feb 2, 2024 Author

github-actions[bot] bot Apr 2, 2024

robotsp
Jan 15, 2024

deepakn94
Jan 16, 2024
Maintainer

robotsp
Jan 20, 2024
Author

deepakn94
Jan 21, 2024
Maintainer

robotsp
Jan 22, 2024
Author

deepakn94
Jan 26, 2024
Maintainer

robotsp
Feb 2, 2024
Author

github-actions[bot]
bot Apr 2, 2024