[V1][VLM] Proper memory profiling for image language models #11210

ywang96 · 2024-12-15T10:57:03Z

This PR adds memory profiling for image language models in V1 so that peak memory usage is correctly measured to avoid potential CUDA OOM.

A few things to note:

We still use the dummy multi_modal_data for memory profiling. This not only allows us to reuse the existing code, but is also developer-friendly since defining a dummy PIL image is much more intuitive than defining dummy pixel values. In order to do so, mm_input_mapper is added to model runner only for profiling purpose.
This current version only works for image (because of limitation on mm_input_mapper). The memory profiling section has supported profiling for other single non-text modality models, but may require some additional design to be extended to mixed non-text modality profiling.
Encoder compute budget and cache size are added to scheduler config, but they remain hardcoded and should be optimized in a later PR.

Verified with VLLM_USE_V1=1 VLLM_ENABLE_V1_MULTIPROCESSING=1 python3 mmmu_bench.py --model OpenGVLab/InternVL2_5-8B --trust-remote-code --gpu-memory-utilization 0.99 from #11196

Main branch:

...
INFO 12-15 10:18:11 uniproc_executor.py:64] # GPU blocks: 31207
...
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 322.00 MiB. GPU 0 has a total capacity of 79.10 GiB of which 249.88 MiB is free.

This PR:

...
INFO 12-15 10:16:26 uniproc_executor.py:64] # GPU blocks: 30973
...
Request throughput: 8.13 req/s
Total generated tokens: 39100
Token generation rate: 317.89 tok/s

Signed-off-by: Roger Wang <[email protected]>

github-actions · 2024-12-15T10:57:15Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Roger Wang <[email protected]>

vllm/multimodal/registry.py

vllm/v1/core/scheduler.py

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Roger Wang <[email protected]>

DarkLight1337

LGTM now!

Signed-off-by: Roger Wang <[email protected]>

…m_profiling

WoosukKwon

Thanks for the PR! Left some minor comments.

vllm/v1/worker/gpu_model_runner.py

ywang96 · 2024-12-16T21:58:12Z

@WoosukKwon I've addressed your comments via 1720f68 - Let me know if there's any other concern!

mgoin

LGTM thanks!

…ject#11210) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: ywang96 <[email protected]>

…ject#11210) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: ywang96 <[email protected]> Signed-off-by: Bowen Wang <[email protected]>

…ject#11210) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: ywang96 <[email protected]>

ywang96 added 2 commits December 15, 2024 09:46

add

27bc692

Signed-off-by: Roger Wang <[email protected]>

comment

91a84d3

Signed-off-by: Roger Wang <[email protected]>

ywang96 requested review from WoosukKwon, robertgshaw2-redhat, njhill, comaniac and alexm-redhat as code owners December 15, 2024 10:57

ywang96 added 2 commits December 15, 2024 11:17

iterate

b40fcff

Signed-off-by: Roger Wang <[email protected]>

comment

3c4f182

Signed-off-by: Roger Wang <[email protected]>

DarkLight1337 reviewed Dec 15, 2024

View reviewed changes

vllm/multimodal/registry.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Dec 15, 2024

View reviewed changes

vllm/v1/core/scheduler.py Show resolved Hide resolved

DarkLight1337 reviewed Dec 15, 2024

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

address comments

72c5b57

Signed-off-by: Roger Wang <[email protected]>

This was referenced Dec 16, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

[Release]: v0.7.0 Release Tracker #11218

Open

DarkLight1337 approved these changes Dec 16, 2024

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 16, 2024

ywang96 and others added 3 commits December 16, 2024 04:56

fix empty tensor

5082f5b

fix empty tensor

d74df49

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'mm_profiling' of https://github.com/ywang96/vllm into m…

e78b889

…m_profiling

WoosukKwon approved these changes Dec 16, 2024

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

address comments

1720f68

mgoin approved these changes Dec 16, 2024

View reviewed changes

ywang96 merged commit 59c9b6e into vllm-project:main Dec 17, 2024
53 checks passed

ywang96 mentioned this pull request Dec 18, 2024

[V1] Fix multimodal profiling #11308

Closed

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[V1][VLM] Proper memory profiling for image language models (vllm-pro…

789dd2a

…ject#11210) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: ywang96 <[email protected]>

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[V1][VLM] Proper memory profiling for image language models (vllm-pro…

7218c68

…ject#11210) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: ywang96 <[email protected]>

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[V1][VLM] Proper memory profiling for image language models (vllm-pro…

4c12024

…ject#11210) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: ywang96 <[email protected]>

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[V1][VLM] Proper memory profiling for image language models (vllm-pro…

738675e

…ject#11210) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: ywang96 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][VLM] Proper memory profiling for image language models #11210

[V1][VLM] Proper memory profiling for image language models #11210

ywang96 commented Dec 15, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 15, 2024

DarkLight1337 left a comment

WoosukKwon left a comment

ywang96 commented Dec 16, 2024

mgoin left a comment

[V1][VLM] Proper memory profiling for image language models #11210

[V1][VLM] Proper memory profiling for image language models #11210

Conversation

ywang96 commented Dec 15, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 15, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

WoosukKwon left a comment

Choose a reason for hiding this comment

ywang96 commented Dec 16, 2024

mgoin left a comment

Choose a reason for hiding this comment

ywang96 commented Dec 15, 2024 •

edited by github-actions bot

Loading