[core] overhaul memory profiling and fix backward compatibility #10511

youkaichao · 2024-11-21T02:16:51Z

fixes #10451 , and clearly explain the memory classification and the procedure.

I also added the initial pytorch memory, to be aligned with the pytorch memory profiler.

the profiling procedure is extracted into vllm/utils , so that we can use it later in v1 too.

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-11-21T02:17:02Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: youkaichao <[email protected]>

vllm/utils.py

vllm/worker/worker.py

DarkLight1337 · 2024-11-21T03:04:08Z

cc @joerunde

vllm/worker/worker.py

vllm/utils.py

mergify · 2024-11-23T05:26:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @youkaichao.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

JaheimLee · 2024-12-03T02:56:20Z

any progress? I do need to run multi-instance with one GPU.

Pydataman · 2024-12-04T10:31:36Z

any progress? I do need to run multi-instance with one GPU.

is this function available?

youkaichao · 2024-12-04T19:27:34Z

let me finish it this week.

joerunde · 2024-12-04T19:48:07Z

Thanks for taking this on @youkaichao!

I think the docs for the --gpu-memory-utilization flag should also be updated in this PR to reflect the changes once this is working properly

Signed-off-by: youkaichao <[email protected]>

tests/worker/test_profile.py

joerunde · 2024-12-16T15:07:27Z

vllm/worker/worker.py

+               "PyTorch activation peak memory\t"
+               f"{(result.torch_peak_increase_in_bytes / GiB_bytes):.2f}GiB\n"
+               "available_kv_cache_memory\t"
+               f"{(available_kv_cache_memory / GiB_bytes):.2f}GiB\n")


Some suggestions on making this a bit nicer:

The log would be much easier to parse if the numbers were aligned on the same column, right now they're all over the place

The descriptions are a mix of plain words and variable names, for logs maybe we should just use words. available_kv_cache_memory: -> KV Cache Size: etc.

"Non torch memory" is the one item on this list that I think might not be easily understood by somebody reading the logs. Maybe calling it something a little more generic like "Memory overhead" would be less distracting

I think all of the newlines take too much space in the logs. IMO it would be more simple to keep the same single-line comma-separated result as before

addressed in #10511 (comment) , PTAL

joerunde · 2024-12-16T15:11:17Z

@youkaichao Any chance you can add in a quick test for the profiling context manager itself? As an example the one I wrote up here was very simple to do: https://github.com/vllm-project/vllm/pull/11120/files#diff-33c13e0b177bacd2f02e29bcb8aea5b49e7ce34901fd8f41fefb65defba1bd33R277-R312

joerunde · 2024-12-16T15:28:01Z

@youkaichao 🤔🤔🤔 Loading facebook/opt-125 twice in the same process on an A100 measures a negative non_torch_memory value:

from vllm import LLM
m1 = LLM("facebook/opt-125m", gpu_memory_utilization=0.25)
m2 = LLM("facebook/opt-125m", gpu_memory_utilization=0.25)

...
INFO 12-16 15:20:32 worker.py:243] non_torch_memory	-0.02GiB

Might not be super important to fix- I think the main use case to unblock here is multi-process vllm serving. But it is interesting, I can't immediately see why that would happen

mgoin

Excellent work Kaichao, I appreciate the walkthrough example in memory_profiling. This passed my local usage and I didn't see the issue Joe saw, or think it is a serious issue. My only nit is on adding all the newlines to the log, I think it was fine as comma-separated list

mgoin · 2024-12-16T17:50:38Z

vllm/worker/worker.py

+               "PyTorch activation peak memory\t"
+               f"{(result.torch_peak_increase_in_bytes / GiB_bytes):.2f}GiB\n"
+               "available_kv_cache_memory\t"
+               f"{(available_kv_cache_memory / GiB_bytes):.2f}GiB\n")


I think all of the newlines take too much space in the logs. IMO it would be more simple to keep the same single-line comma-separated result as before

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-16T18:15:57Z

@mgoin changed the logging to be:

INFO 12-16 10:14:30 worker.py:241] Memory profiling takes 1.01 seconds
INFO 12-16 10:14:30 worker.py:241] the current vLLM instance can use total_gpu_memory (79.22GiB) x gpu_memory_utilization (0.90) = 71.29GiB
INFO 12-16 10:14:30 worker.py:241] model weights take 14.96GiB; non_torch_memory takes 0.18GiB; PyTorch activation peak memory takes 1.26GiB; the rest of the memory reserved for KV Cache is 54.90GiB.

Let me know if you have further ideas on how to improve the readability.

youkaichao · 2024-12-16T18:25:48Z

@youkaichao 🤔🤔🤔 Loading facebook/opt-125 twice in the same process on an A100 measures a negative non_torch_memory value:
from vllm import LLM
m1 = LLM("facebook/opt-125m", gpu_memory_utilization=0.25)
m2 = LLM("facebook/opt-125m", gpu_memory_utilization=0.25)
...
INFO 12-16 15:20:32 worker.py:243] non_torch_memory	-0.02GiB
Might not be super important to fix- I think the main use case to unblock here is multi-process vllm serving. But it is interesting, I can't immediately see why that would happen

@joerunde this is because PyTorch's internal memory fragmentation. If PyTorch allocates 2MiB from cuda, and allocate 1MiB only, then this 1 MiB will be accounted as non-torch memory. And when you run it the next time, maybe you allocate another 1 MiB, and the internal memory fragmentation reduces.

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-16T18:59:26Z

@youkaichao Any chance you can add in a quick test for the profiling context manager itself? As an example the one I wrote up here was very simple to do: #11120 (files)

@joerunde that's a great idea! I added it now, PTAL.

tests/test_utils.py

joerunde

Looks pretty good to me! Thanks for looking at this so thoroughly

youkaichao · 2024-12-16T21:32:13Z

errors are unrelated, merging

…-project#10511) Signed-off-by: youkaichao <[email protected]>

fix

7ab6d00

Signed-off-by: youkaichao <[email protected]>

youkaichao requested review from zhuohan123, alexm-redhat, comaniac and njhill as code owners November 21, 2024 02:16

fix tests

d5f3aaf

Signed-off-by: youkaichao <[email protected]>

youkaichao requested review from DarkLight1337, robertgshaw2-redhat and simon-mo as code owners November 21, 2024 02:17

youkaichao added 2 commits November 20, 2024 18:23

fix

40f7b73

Signed-off-by: youkaichao <[email protected]>

fix

2250ade

Signed-off-by: youkaichao <[email protected]>

mgoin self-requested a review November 21, 2024 02:58

DarkLight1337 reviewed Nov 21, 2024

View reviewed changes

vllm/utils.py Outdated Show resolved Hide resolved

vllm/worker/worker.py Outdated Show resolved Hide resolved

DarkLight1337 mentioned this pull request Nov 21, 2024

[Bug]: vllm failed to run two instance with one gpu #10533

Closed

1 task

tjohnson31415 reviewed Nov 21, 2024

View reviewed changes

vllm/worker/worker.py Outdated Show resolved Hide resolved

tjohnson31415 reviewed Nov 21, 2024

View reviewed changes

vllm/utils.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Nov 23, 2024

zhekazuev mentioned this pull request Nov 27, 2024

[bitnami/vllm] feat: Add new container bitnami/containers#75274

Closed

DarkLight1337 mentioned this pull request Dec 5, 2024

[Bug]: vllm 0.6.4.post1 out of VRAM depending on startup order #10912

Closed

1 task

joerunde mentioned this pull request Dec 12, 2024

[Core] Fix memory profiling #11120

Closed

Merge branch 'main' into memory_profile

79d064f

mergify bot removed the needs-rebase label Dec 14, 2024

fix

fbc2eda

Signed-off-by: youkaichao <[email protected]>

update tests

352d3cf

Signed-off-by: youkaichao <[email protected]>

youkaichao commented Dec 14, 2024

View reviewed changes

tests/worker/test_profile.py Show resolved Hide resolved

simon-mo mentioned this pull request Dec 16, 2024

[Release]: v0.7.0 Release Tracker #11218

Open

2 tasks

joerunde reviewed Dec 16, 2024

View reviewed changes

mgoin approved these changes Dec 16, 2024

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 16, 2024

youkaichao added 5 commits December 16, 2024 10:01

Merge branch 'main' into memory_profile

8c7d512

remove newline

7dc2f39

Signed-off-by: youkaichao <[email protected]>

fix

3c17006

Signed-off-by: youkaichao <[email protected]>

fix

081498b

Signed-off-by: youkaichao <[email protected]>

fix

2d8243b

Signed-off-by: youkaichao <[email protected]>

youkaichao added 3 commits December 16, 2024 10:48

add tests

340dd60

Signed-off-by: youkaichao <[email protected]>

fix

7e5630e

Signed-off-by: youkaichao <[email protected]>

noqa

01b1898

Signed-off-by: youkaichao <[email protected]>

joerunde reviewed Dec 16, 2024

View reviewed changes

tests/test_utils.py Show resolved Hide resolved

joerunde reviewed Dec 16, 2024

View reviewed changes

tests/test_utils.py Show resolved Hide resolved

joerunde approved these changes Dec 16, 2024

View reviewed changes

youkaichao merged commit 551603f into vllm-project:main Dec 16, 2024
52 of 54 checks passed

youkaichao deleted the memory_profile branch December 16, 2024 21:32

This was referenced Dec 17, 2024

[Bug]: torch.OutOfMemoryError for 0.6.4.post1 but 0.6.3.post1 is working #11251

Open

[Bug]: Increased VRAM usage since v0.6.4.post1 (vs v0.6.3.post1) [OOM][KV cache] #11230

Open

joerunde mentioned this pull request Dec 19, 2024

[Bug]:The parameter gpu_memory_utilization does not take effect #10637

Open

1 task

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[core] overhaul memory profiling and fix backward compatibility (vllm…

1e2e57a

…-project#10511) Signed-off-by: youkaichao <[email protected]>

benchislett mentioned this pull request Jan 14, 2025

[Bug]: Memory profiler does not consider CUDA context memory #12059

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] overhaul memory profiling and fix backward compatibility #10511

[core] overhaul memory profiling and fix backward compatibility #10511

youkaichao commented Nov 21, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 21, 2024

DarkLight1337 commented Nov 21, 2024

mergify bot commented Nov 23, 2024

JaheimLee commented Dec 3, 2024

Pydataman commented Dec 4, 2024

youkaichao commented Dec 4, 2024

joerunde commented Dec 4, 2024

joerunde Dec 16, 2024

mgoin Dec 16, 2024

youkaichao Dec 16, 2024

joerunde commented Dec 16, 2024

joerunde commented Dec 16, 2024

mgoin left a comment •

edited

Loading

mgoin Dec 16, 2024

youkaichao commented Dec 16, 2024

youkaichao commented Dec 16, 2024

youkaichao commented Dec 16, 2024

joerunde left a comment

youkaichao commented Dec 16, 2024

[core] overhaul memory profiling and fix backward compatibility #10511

[core] overhaul memory profiling and fix backward compatibility #10511

Conversation

youkaichao commented Nov 21, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 21, 2024

DarkLight1337 commented Nov 21, 2024

mergify bot commented Nov 23, 2024

JaheimLee commented Dec 3, 2024

Pydataman commented Dec 4, 2024

youkaichao commented Dec 4, 2024

joerunde commented Dec 4, 2024

joerunde Dec 16, 2024

Choose a reason for hiding this comment

mgoin Dec 16, 2024

Choose a reason for hiding this comment

youkaichao Dec 16, 2024

Choose a reason for hiding this comment

joerunde commented Dec 16, 2024

joerunde commented Dec 16, 2024

mgoin left a comment • edited Loading

Choose a reason for hiding this comment

mgoin Dec 16, 2024

Choose a reason for hiding this comment

youkaichao commented Dec 16, 2024

youkaichao commented Dec 16, 2024

youkaichao commented Dec 16, 2024

joerunde left a comment

Choose a reason for hiding this comment

youkaichao commented Dec 16, 2024

youkaichao commented Nov 21, 2024 •

edited by github-actions bot

Loading

mgoin left a comment •

edited

Loading