You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this library, especially for the clear documentation. I have been enjoying using it, but the issue with GPU memory management has somewhat dampened my experience in recent days.
It seems that this is a long-standing issue (#487#522#1573#1712; likely non-exhaustive) but seems to be still unresolved (#1717). I can also confirm myself that it is still an issue with the latest release.
Use Case:
This is a scenario where the existing problem is aggravated due to the loop but I am encountering this in an active learning setup where a model is trained and predictions are created repeatedly within a loop. Of course this can be solved by just using a larger GPU, which I previously did to cope with this, but I want my examples to run on colab as well.
Description of the problem:
The problem (at least for the paths of fit() and encode() which were relevant to me) lead to situations were GPU memory cannot be released or will be released too late. This eventuall results in OutOfMemoryError: CUDA out of memory. Tried to allocate [....],
What are your thoughts on this problem? Is someone working on it? If not, I encountered a similar behaviour in my own library and managed to get that under control. See for example this line of code in my fit() method. This has rarely ever been a problem again over many active learning runs. I could provide a PR applying this fix at the appropriate locations.
The text was updated successfully, but these errors were encountered:
Update:
While I did not succeed in building a minimal failing notebook in huggingface/setfit#242 (with limited time at least), I keep running into this problem.
I am currently running an active learning setup which encompasses 16 single runs (8x vanilla BERT / 8x sentence transformer) with varying query strategies. The BERT Model is much larger (336M) parameters than the SetFit model, but despite this not a single BERT run fails. However, SetFit runs regularly fail with RuntimeError: CUDA out of memory..
In #1795@Dobiasd managed to provide a working sample of a leaking encode(), maybe we need to take a closer look at that part first.
Although I think all my recent errors were caused by fit() maybe that is just the symptom and encode() fills the GPU memory.
Hi,
Thank you for this library, especially for the clear documentation. I have been enjoying using it, but the issue with GPU memory management has somewhat dampened my experience in recent days.
It seems that this is a long-standing issue (#487 #522 #1573 #1712; likely non-exhaustive) but seems to be still unresolved (#1717). I can also confirm myself that it is still an issue with the latest release.
Use Case:
This is a scenario where the existing problem is aggravated due to the loop but I am encountering this in an active learning setup where a model is trained and predictions are created repeatedly within a loop. Of course this can be solved by just using a larger GPU, which I previously did to cope with this, but I want my examples to run on colab as well.
Description of the problem:
The problem (at least for the paths of
fit()
andencode()
which were relevant to me) lead to situations were GPU memory cannot be released or will be released too late. This eventuall results inOutOfMemoryError: CUDA out of memory. Tried to allocate [....]
,What are your thoughts on this problem? Is someone working on it? If not, I encountered a similar behaviour in my own library and managed to get that under control. See for example this line of code in my
fit()
method. This has rarely ever been a problem again over many active learning runs. I could provide a PR applying this fix at the appropriate locations.The text was updated successfully, but these errors were encountered: