You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use quantize_ function and run the model, then comment it out in next running time, the original model will appear OOM, and it seems that the pytorch GPU memory automatic reuse function is no longer available. As the running continues the memory to grow until OOM.(My model contains a stack of some modules)
mymodel = MyModel().eval().cuda().half()
quantize_(mymodel, int4_weight_only()) #then comment it in next running
#gen random input
input_ = torch.rand ...
#run model forward
output_ = mymodel(input_)
print(output_)
Is there any cache config be saved in the torchao quantize process and it couldn't be clear?
The text was updated successfully, but these errors were encountered:
bianxuxuxu
changed the title
Out Of Memory Error When next running
Out Of Memory Error
Sep 12, 2024
When I use quantize_ function and run the model, then comment it out in next running time, the original model will appear OOM, and it seems that the pytorch GPU memory automatic reuse function is no longer available. As the running continues the memory to grow until OOM.(My model contains a stack of some modules)
Is there any cache config be saved in the torchao quantize process and it couldn't be clear?
The text was updated successfully, but these errors were encountered: