-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torch.compile] add logging for compilation time #10941
Conversation
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: youkaichao <[email protected]>
example output (for the toy tests):
see https://buildkite.com/vllm/fastcheck/builds/9379#01939a08-6359-4659-9b3c-c07576a7efee/6465-8353 |
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
this pr aims to add three logging information:
how to read it: the increase in 1, when using or not using 2 shows the cost of every shape compilation, so that users can select it according to their budget. Note that some compilation like Dynamo bytecode compilation and triton compilation are not considered here. 3 just aggregates 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't have enough familiarity to the code for proper review. However, the code apparently looks OK to me. Please feel free to merge.
if compilation_config.level == CompilationLevel.PIECEWISE: | ||
logger.info("graph compilation takes %.2f s in total", | ||
compilation_config.compilation_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: Why does it print only for pw CUDA graphs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompilationLevel.PIECEWISE is piecewise compile, not piecewise cudagraph. this is orthogonal to cudagraph.
@@ -108,6 +120,8 @@ def split_graph(graph: fx.GraphModule, | |||
# we share the global graph pool among all the backends | |||
global_graph_pool = None | |||
|
|||
compilation_start_time = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe None instead of 0.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering: can we somehow make this more robust? Since the code touching this var is scattered into different places, I feel it's error prone...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's why I want to merge some functions in the worker/executor. however, given the current code status, i don't have bandwidth to refactor those files.
Please make sure to use |
Co-authored-by: Woosuk Kwon <[email protected]>
Signed-off-by: youkaichao <[email protected]>
@WoosukKwon thanks for the review! |
Signed-off-by: youkaichao <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>
Signed-off-by: youkaichao <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>
No description provided.