[Core] Fix memory profiling #11120

joerunde · 2024-12-12T00:16:03Z

Because why not add another PR?

This PR takes @tjohnson31415's very clean profiling method from #10498 and the context manager from @youkaichao from #10511 and joins them together with a unit test to ensure that everything is working as expected.

Example output:

INFO 12-11 23:12:57 worker.py:217] Memory profiling results:
INFO 12-11 23:12:57 worker.py:217] duration               0.44 seconds
INFO 12-11 23:12:57 worker.py:217] total_gpu_memory       79.33GiB
INFO 12-11 23:12:57 worker.py:217] gpu_memory_utilization 0.25
INFO 12-11 23:12:57 worker.py:217] target_allocation      19.83GiB
INFO 12-11 23:12:57 worker.py:217] baseline_memory        4.71GiB
INFO 12-11 23:12:57 worker.py:217] max_inference_spike    2.35GiB
INFO 12-11 23:12:57 worker.py:217] kv_cache_size          12.77GiB

FIX #10451 (link existing issues this PR will resolve)

Signed-off-by: youkaichao <[email protected]>

Signed-off-by: Joe Runde <[email protected]>

Co-Authored-By: Travis Johnson <[email protected]> Signed-off-by: Joe Runde <[email protected]>

Signed-off-by: Joe Runde <[email protected]>

github-actions · 2024-12-12T00:16:14Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify · 2024-12-12T00:16:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @joerunde.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2024-12-12T23:38:20Z

Uh oh...
The profiling seems to give the same numbers as before, but now tests are failing because only ~83% of the gpu ram is available and the tests that want the default 90% aren't happy. If some test artifact isn't being cleaned up then this would be expected now that we properly measure the memory usage starting from before the model is loaded.

joerunde · 2024-12-17T15:19:09Z

Fixed in #10511

youkaichao and others added 7 commits November 20, 2024 18:06

fix

7ab6d00

Signed-off-by: youkaichao <[email protected]>

fix tests

d5f3aaf

Signed-off-by: youkaichao <[email protected]>

fix

40f7b73

Signed-off-by: youkaichao <[email protected]>

fix

2250ade

Signed-off-by: youkaichao <[email protected]>

🎨 fix visual alignment

c9fa8c9

Signed-off-by: Joe Runde <[email protected]>

🐛 use Travis' simple profiling method

271987b

Co-Authored-By: Travis Johnson <[email protected]> Signed-off-by: Joe Runde <[email protected]>

🧪 add test for memory profiler

68066b6

Signed-off-by: Joe Runde <[email protected]>

mergify bot added the needs-rebase label Dec 12, 2024

Merge remote-tracking branch 'upstream/main' into memory_profile

dcce884

mergify bot removed the needs-rebase label Dec 12, 2024

joerunde marked this pull request as ready for review December 12, 2024 16:49

joerunde requested review from DarkLight1337, robertgshaw2-redhat, simon-mo, zhuohan123, youkaichao, alexm-redhat, comaniac and njhill as code owners December 12, 2024 16:49

joerunde added 4 commits December 12, 2024 10:14

📝 update docstring

220845b

Signed-off-by: Joe Runde <[email protected]>

🎨 use cool subtracion overload

0ca0213

Signed-off-by: Joe Runde <[email protected]>

✅ fixup model profile test

5b13d16

Signed-off-by: Joe Runde <[email protected]>

🥅 add initial vram check

661da4e

Signed-off-by: Joe Runde <[email protected]>

joerunde force-pushed the memory_profile branch from a22fad9 to 661da4e Compare December 12, 2024 22:25

Merge remote-tracking branch 'upstream/main' into memory_profile

0072e55

joerunde closed this Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Fix memory profiling #11120

[Core] Fix memory profiling #11120

joerunde commented Dec 12, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 12, 2024

mergify bot commented Dec 12, 2024

joerunde commented Dec 12, 2024

joerunde commented Dec 17, 2024

[Core] Fix memory profiling #11120

[Core] Fix memory profiling #11120

Conversation

joerunde commented Dec 12, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 12, 2024

mergify bot commented Dec 12, 2024

joerunde commented Dec 12, 2024

joerunde commented Dec 17, 2024

joerunde commented Dec 12, 2024 •

edited by github-actions bot

Loading