Fix Cuda out of memory issue in model.encode by allowing user to transfer to cpu #1717
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In issue #487 and issue #522, users were running into OOM issues when batch size is large, because the embeddings aren't offloaded onto cpu.
The PR that fixed this only fixes it
if convert_to_numpy
. That means if you have convert_to_numpy=False, then your problem still exists.In this PR, I just added an extra flag that allows the embeddings to be offloaded to cpu. This gives the user the flexibility to save the embeddings (for example if they are saving the SentenceTransformer embeddings to disk or keeping them in RAM for knn, which is often the case) instead of keeping all the embeddings on the gpu.