GPU out of memory issues #1793

chschroeder · 2022-12-22T11:37:34Z

Hi,

Thank you for this library, especially for the clear documentation. I have been enjoying using it, but the issue with GPU memory management has somewhat dampened my experience in recent days.

It seems that this is a long-standing issue (#487 #522 #1573 #1712; likely non-exhaustive) but seems to be still unresolved (#1717). I can also confirm myself that it is still an issue with the latest release.

Use Case:
This is a scenario where the existing problem is aggravated due to the loop but I am encountering this in an active learning setup where a model is trained and predictions are created repeatedly within a loop. Of course this can be solved by just using a larger GPU, which I previously did to cope with this, but I want my examples to run on colab as well.

Description of the problem:
The problem (at least for the paths of fit() and encode() which were relevant to me) lead to situations were GPU memory cannot be released or will be released too late. This eventuall results in OutOfMemoryError: CUDA out of memory. Tried to allocate [....],

What are your thoughts on this problem? Is someone working on it? If not, I encountered a similar behaviour in my own library and managed to get that under control. See for example this line of code in my fit() method. This has rarely ever been a problem again over many active learning runs. I could provide a PR applying this fix at the appropriate locations.

The text was updated successfully, but these errors were encountered:

chschroeder · 2023-02-27T22:59:37Z

Update:
While I did not succeed in building a minimal failing notebook in huggingface/setfit#242 (with limited time at least), I keep running into this problem.

I am currently running an active learning setup which encompasses 16 single runs (8x vanilla BERT / 8x sentence transformer) with varying query strategies. The BERT Model is much larger (336M) parameters than the SetFit model, but despite this not a single BERT run fails. However, SetFit runs regularly fail with RuntimeError: CUDA out of memory..

In #1795 @Dobiasd managed to provide a working sample of a leaking encode(), maybe we need to take a closer look at that part first.

Although I think all my recent errors were caused by fit() maybe that is just the symptom and encode() fills the GPU memory.

chschroeder mentioned this issue Dec 22, 2022

GPU out of memory issues huggingface/setfit#242

Closed

chschroeder mentioned this issue Sep 11, 2023

Memory leak in SentenceTransformer instantiation #2298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU out of memory issues #1793

GPU out of memory issues #1793

chschroeder commented Dec 22, 2022

chschroeder commented Feb 27, 2023

GPU out of memory issues #1793

GPU out of memory issues #1793

Comments

chschroeder commented Dec 22, 2022

chschroeder commented Feb 27, 2023