Out Of Memory Error #879

bianxuxuxu · 2024-09-12T11:01:11Z

When I use quantize_ function and run the model, then comment it out in next running time, the original model will appear OOM, and it seems that the pytorch GPU memory automatic reuse function is no longer available. As the running continues the memory to grow until OOM.(My model contains a stack of some modules)

mymodel = MyModel().eval().cuda().half()
quantize_(mymodel, int4_weight_only()) #then comment it in next running

#gen random input
input_ = torch.rand ...
#run model forward
output_ = mymodel(input_)
print(output_)

Is there any cache config be saved in the torchao quantize process and it couldn't be clear?

The text was updated successfully, but these errors were encountered:

jerryzh168 · 2024-09-12T20:46:19Z

Hi @bianxuxuxu can you give a repro so we can debug the issue?

bianxuxuxu · 2024-09-13T07:15:14Z

Hi, @jerryzh168
There may be something wrong with my environment, I'll check again. Thank you

bianxuxuxu changed the title ~~Out Of Memory Error When next running~~ Out Of Memory Error Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out Of Memory Error #879

Out Of Memory Error #879

bianxuxuxu commented Sep 12, 2024

jerryzh168 commented Sep 12, 2024

bianxuxuxu commented Sep 13, 2024

Out Of Memory Error #879

Out Of Memory Error #879

Comments

bianxuxuxu commented Sep 12, 2024

jerryzh168 commented Sep 12, 2024

bianxuxuxu commented Sep 13, 2024