[torch.compile] add logging for compilation time #10941

youkaichao · 2024-12-06T03:33:21Z

No description provided.

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-12-06T03:33:36Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-06T04:32:54Z

example output (for the toy tests):

Compiling a graph for general shape takes 1.55 s

see https://buildkite.com/vllm/fastcheck/builds/9379#01939a08-6359-4659-9b3c-c07576a7efee/6465-8353

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-06T06:42:38Z

this pr aims to add three logging information:

engine-level initialization, e.g.

INFO 12-05 22:37:22 llm_engine.py:493] init engine (profile, create kv cache, warmup model) took 15.08 seconds
INFO 12-05 22:39:24 llm_engine.py:493] init engine (profile, create kv cache, warmup model) took 40.77 seconds
INFO 12-05 22:41:51 llm_engine.py:493] init engine (profile, create kv cache, warmup model) took 50.18 seconds

graph compilation for every shape (including the symbolic shape compilation), e.g.

INFO 12-05 22:39:02 backends.py:55] Compiling a graph for general shape takes 14.73 s
INFO 12-05 22:41:51 backends.py:58] Compiling a graph for shape 1 takes 9.99 s

aggregation of the numbers in 2, e.g.

INFO 12-05 22:39:04 monitor.py:13] graph compilation takes 14.73 s in total
INFO 12-05 22:41:51 monitor.py:13] graph compilation takes 24.758146286010742 s in total

how to read it:

the increase in 1, when using or not using torch.compile , is the total cost of torch.compile .

2 shows the cost of every shape compilation, so that users can select it according to their budget. Note that some compilation like Dynamo bytecode compilation and triton compilation are not considered here.

3 just aggregates 2.

WoosukKwon

Honestly, I don't have enough familiarity to the code for proper review. However, the code apparently looks OK to me. Please feel free to merge.

vllm/compilation/backends.py

WoosukKwon · 2024-12-06T07:14:51Z

vllm/compilation/monitor.py

+    if compilation_config.level == CompilationLevel.PIECEWISE:
+        logger.info("graph compilation takes %.2f s in total",
+                    compilation_config.compilation_time)


Dumb question: Why does it print only for pw CUDA graphs?

CompilationLevel.PIECEWISE is piecewise compile, not piecewise cudagraph. this is orthogonal to cudagraph.

WoosukKwon · 2024-12-06T07:16:35Z

vllm/compilation/backends.py

@@ -108,6 +120,8 @@ def split_graph(graph: fx.GraphModule,
 # we share the global graph pool among all the backends
 global_graph_pool = None

+compilation_start_time = 0.0


Maybe None instead of 0.0?

Just wondering: can we somehow make this more robust? Since the code touching this var is scattered into different places, I feel it's error prone...

that's why I want to merge some functions in the worker/executor. however, given the current code status, i don't have bandwidth to refactor those files.

WoosukKwon · 2024-12-06T07:24:08Z

INFO 12-05 22:41:51 monitor.py:13] graph compilation takes 24.758146286010742 s in total

Please make sure to use %2f for this log (if it hasn't been fixed yet).

Co-authored-by: Woosuk Kwon <[email protected]>

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-06T07:55:23Z

@WoosukKwon thanks for the review!

Signed-off-by: youkaichao <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

youkaichao added 2 commits December 5, 2024 19:12

compute compilation time

b9a0328

Signed-off-by: youkaichao <[email protected]>

fix one graph

c7b4248

Signed-off-by: youkaichao <[email protected]>

update name

70e6b4e

Signed-off-by: youkaichao <[email protected]>

youkaichao requested a review from WoosukKwon December 6, 2024 04:33

youkaichao marked this pull request as draft December 6, 2024 05:08

youkaichao added 11 commits December 5, 2024 21:12

add context manager

3044d17

Signed-off-by: youkaichao <[email protected]>

dirty impl

8c0ac0b

Signed-off-by: youkaichao <[email protected]>

use api

0f6719b

Signed-off-by: youkaichao <[email protected]>

fi

2bce8de

Signed-off-by: youkaichao <[email protected]>

use timing at engine level

2b65379

Signed-off-by: youkaichao <[email protected]>

engine level logging

81914b1

Signed-off-by: youkaichao <[email protected]>

remove the call in the worker

7ab3f8b

Signed-off-by: youkaichao <[email protected]>

revert

10c88ee

Signed-off-by: youkaichao <[email protected]>

fix logging

6f39058

Signed-off-by: youkaichao <[email protected]>

fix symbolic only

70e3534

Signed-off-by: youkaichao <[email protected]>

fix format

fa36de5

Signed-off-by: youkaichao <[email protected]>

youkaichao marked this pull request as ready for review December 6, 2024 06:43

youkaichao requested review from robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat and zhuohan123 as code owners December 6, 2024 06:43

WoosukKwon approved these changes Dec 6, 2024

View reviewed changes

youkaichao and others added 2 commits December 5, 2024 23:39

Update vllm/compilation/backends.py

d87ac20

Co-authored-by: Woosuk Kwon <[email protected]>

merge code

f502121

Signed-off-by: youkaichao <[email protected]>

youkaichao enabled auto-merge (squash) December 6, 2024 08:08

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2024

youkaichao merged commit b031a45 into vllm-project:main Dec 6, 2024
65 checks passed

youkaichao deleted the compilation_time branch December 6, 2024 16:18

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[torch.compile] add logging for compilation time (vllm-project#10941)

ee718e1

Signed-off-by: youkaichao <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[torch.compile] add logging for compilation time (vllm-project#10941)

3f78eb4

Signed-off-by: youkaichao <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] add logging for compilation time #10941

[torch.compile] add logging for compilation time #10941

youkaichao commented Dec 6, 2024

github-actions bot commented Dec 6, 2024

youkaichao commented Dec 6, 2024

youkaichao commented Dec 6, 2024 •

edited

Loading

WoosukKwon left a comment

WoosukKwon Dec 6, 2024

youkaichao Dec 6, 2024

WoosukKwon Dec 6, 2024

WoosukKwon Dec 6, 2024

youkaichao Dec 6, 2024

WoosukKwon commented Dec 6, 2024

youkaichao commented Dec 6, 2024

[torch.compile] add logging for compilation time #10941

[torch.compile] add logging for compilation time #10941

Conversation

youkaichao commented Dec 6, 2024

github-actions bot commented Dec 6, 2024

youkaichao commented Dec 6, 2024

youkaichao commented Dec 6, 2024 • edited Loading

WoosukKwon left a comment

Choose a reason for hiding this comment

WoosukKwon Dec 6, 2024

Choose a reason for hiding this comment

youkaichao Dec 6, 2024

Choose a reason for hiding this comment

WoosukKwon Dec 6, 2024

Choose a reason for hiding this comment

WoosukKwon Dec 6, 2024

Choose a reason for hiding this comment

youkaichao Dec 6, 2024

Choose a reason for hiding this comment

WoosukKwon commented Dec 6, 2024

youkaichao commented Dec 6, 2024

youkaichao commented Dec 6, 2024 •

edited

Loading