CUDA Out Of Memory issue #348

emil-peters · 2024-10-28T15:33:07Z

As far as I understand it, and during some testing I kept on getting Cuda OOM errors while running code with pyinstrument where multiple models were run one after another.
While making sure there was no reference kept to the tensors in the python code, I kept on getting CUDA OOM errors when using pyinstrument. But once disabled the errors disappeared and my VRAM reset as expected after each reference was deleted.

Is there an option to ensure pyinstrument clears its references to onnx and torch tensors, especially after calling del tensor.
As I'd like to keep using pyinstrument but it's not feasible atm.

Emil

The text was updated successfully, but these errors were encountered:

Aedial · 2024-11-06T13:58:56Z

I have a similar problem where a relatively heavy object is not garbage collected when I leave the context, even with del (python 3.12, interval = 0.1). The growth shows rather starkly on tracemalloc, with the number of objects growing by exactly the number of instantiations (or a multiple of). This results in an OOM of the whole process after a few minutes.
Such behavior only occurs when using pyinstrument, the RAM usage staying stable with any other profiler. I have been using pyinstrument for years and I don't recall such a problem before (perhaps with changing from 3.7 to 3.12?). Might be related to #296.

davidemassarenti-optio3 · 2024-11-07T03:49:40Z

I'm encountering a similar problem. I tracked it down to calls to output_html.

        profiler.stop()
        profiler.output_html()
        profiler.reset()

Using 4.6.2, memory usage (max RSS) climbs ~2MB over 100 profiling sessions.
Using 5.0.0, memory usage climbs ~40MB for the same number of sessions.

If I comment out the call to output_html, the memory stays steady

xiaobanni · 2024-11-18T13:34:33Z

As far as I understand it, and during some testing I kept on getting Cuda OOM errors while running code with pyinstrument where multiple models were run one after another. While making sure there was no reference kept to the tensors in the python code, I kept on getting CUDA OOM errors when using pyinstrument. But once disabled the errors disappeared and my VRAM reset as expected after each reference was deleted.

Is there an option to ensure pyinstrument clears its references to onnx and torch tensors, especially after calling del tensor. As I'd like to keep using pyinstrument but it's not feasible atm.

Emil

I am facing the same question. The code uses torch gpu runs well with python, but encounters torch.OutOfMemoryError: CUDA out of memory. when starts with pyinstrument.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Out Of Memory issue #348

CUDA Out Of Memory issue #348

emil-peters commented Oct 28, 2024

Aedial commented Nov 6, 2024

davidemassarenti-optio3 commented Nov 7, 2024

xiaobanni commented Nov 18, 2024 •

edited

Loading

CUDA Out Of Memory issue #348

CUDA Out Of Memory issue #348

Comments

emil-peters commented Oct 28, 2024

Aedial commented Nov 6, 2024

davidemassarenti-optio3 commented Nov 7, 2024

xiaobanni commented Nov 18, 2024 • edited Loading

xiaobanni commented Nov 18, 2024 •

edited

Loading