Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] Improve TP>1 Error Handling + Stack Trace #11721

Merged
merged 44 commits into from
Jan 3, 2025
Merged
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
2d857cd
remove shutdown from LLMEngine
robertgshaw2-redhat Dec 31, 2024
9e70c5f
format
robertgshaw2-redhat Dec 31, 2024
f34875c
no need for shutdown in asyncllm
robertgshaw2-redhat Dec 31, 2024
7a777d9
remove from asyncllm
robertgshaw2-redhat Dec 31, 2024
dfc9dee
stash
robertgshaw2-redhat Dec 31, 2024
c72b45a
update
robertgshaw2-redhat Dec 31, 2024
4e2dc00
fix
robertgshaw2-redhat Dec 31, 2024
0b4b6af
added back explicit del
robertgshaw2-redhat Dec 31, 2024
4c445af
stash
robertgshaw2-redhat Dec 31, 2024
567b424
working
robertgshaw2-redhat Jan 1, 2025
7d04b98
fix failing test
robertgshaw2-redhat Jan 3, 2025
62e1022
remove explicit shutdown calls
robertgshaw2-redhat Jan 3, 2025
0b0ca08
updated
robertgshaw2-redhat Jan 3, 2025
729938a
pdated
robertgshaw2-redhat Jan 3, 2025
0259241
update
robertgshaw2-redhat Jan 3, 2025
58e4b36
working
robertgshaw2-redhat Jan 3, 2025
cacf6b0
updated
robertgshaw2-redhat Jan 3, 2025
ccc747d
fixup
robertgshaw2-redhat Jan 3, 2025
ddc2a97
fixup
robertgshaw2-redhat Jan 3, 2025
af0d529
reduce cruft
robertgshaw2-redhat Jan 3, 2025
17e152b
updated
robertgshaw2-redhat Jan 3, 2025
37859d7
finish
robertgshaw2-redhat Jan 3, 2025
c29f329
updated
robertgshaw2-redhat Jan 3, 2025
1c4b92a
updated
robertgshaw2-redhat Jan 3, 2025
eb9b00b
stash
robertgshaw2-redhat Jan 3, 2025
1da99a8
updated
robertgshaw2-redhat Jan 3, 2025
ca7b92d
Merge branch 'main' into tp-shutdown
robertgshaw2-redhat Jan 3, 2025
2743166
updated
robertgshaw2-redhat Jan 3, 2025
8e257c1
stash
robertgshaw2-redhat Jan 3, 2025
b7c50dc
revert spurious change
robertgshaw2-redhat Jan 3, 2025
dcfd3b8
updated
robertgshaw2-redhat Jan 3, 2025
6e0e0d4
stash
robertgshaw2-redhat Jan 3, 2025
55a6195
updated
robertgshaw2-redhat Jan 3, 2025
aa6954f
updated
robertgshaw2-redhat Jan 3, 2025
1d15ae0
remove cruft
robertgshaw2-redhat Jan 3, 2025
0347baa
Update vllm/v1/executor/multiproc_executor.py
robertgshaw2-redhat Jan 3, 2025
20b8fa2
stash
robertgshaw2-redhat Jan 3, 2025
32840f2
Merge branch 'tp-shutdown' of https://github.com/neuralmagic/vllm int…
robertgshaw2-redhat Jan 3, 2025
884879a
switch to SIGUSR1
robertgshaw2-redhat Jan 3, 2025
bb86a03
updated
robertgshaw2-redhat Jan 3, 2025
405bcc1
Update vllm/v1/engine/core_client.py
robertgshaw2-redhat Jan 3, 2025
25e0fea
update message
robertgshaw2-redhat Jan 3, 2025
efd6270
updated
robertgshaw2-redhat Jan 3, 2025
a5a306e
fixed!
robertgshaw2-redhat Jan 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
remove from asyncllm
robertgshaw2-redhat committed Dec 31, 2024
commit 7a777d9ea0f83f706ac6ce3d6eff23f61d61d933
4 changes: 0 additions & 4 deletions vllm/v1/engine/async_llm.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import asyncio
import os
import signal
import weakref
from typing import AsyncGenerator, Dict, List, Mapping, Optional, Type, Union

from vllm.config import ModelConfig, VllmConfig
@@ -42,9 +41,6 @@ def __init__(
log_requests: bool = True,
start_engine_loop: bool = True,
) -> None:
# Call self.shutdown at exit to clean up
# and ensure workers will be terminated.
# self._finalizer = weakref.finalize(self, self.shutdown)

# The child processes will send SIGQUIT when unrecoverable
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: moved to CoreClient so that it can be shared across AsyncLLM and LLMEngine

# errors happen. We kill the process tree here so that the