-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inline vLLM inference provider to regression tests and fix regressions #662
Open
frreiss
wants to merge
7
commits into
meta-llama:main
Choose a base branch
from
frreiss:vllm
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
b74a051
Add inline vLLM provider to regression tests
frreiss 9d23c06
Fix regressions in inline vLLM provider
frreiss 8e358ec
Merge branch 'vllm' into vllm-merge-1
frreiss 3de586a
Merge pull request #5 from frreiss/vllm-merge-1
frreiss 6ec9eab
Redo code change after merge
frreiss c8580d3
Apply formatting to source files
frreiss 82c10c9
Minor change to force rerun of automatic jobs
frreiss File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frreiss can you explain a bit what this change does (or rather not having this causes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, happy to explain.
The fixtures in
llama_stack/providers/tests/inference/fixtures.py
are all tagged with@pytest.fixture(scope="session")
. This tag means that these fixtures are initialized once, then reused for the rest of the test session. Reusing the fixtures this way makes the tests run faster. The fixtures for inline providers need to load models from disk.The default behavior of the
pytest-asyncio
plugin is to spawn a new ayncio event loop for every test case.The result of these two different scoping policies is that the fixtures are being initialized under one event loop, then the test cases interact with the fixtures from a different event loop (one event loop per test case). This change of event loops happens not to break the existing tests, because the inference providers they exercise are stateless at the asyncio layer.
vLLM is not stateless at the asyncio layer. An idle vLLM instance has an event handler waiting for new inference requests to be added to the current batch. Switching to a different event loop drops this event handler, preventing vLLM's batching mechanism from functioning. When I added the inline vLLM provider to the test cases, the change in event loops caused the inference tests to hang.
Adding
@pytest.mark.asyncio(loop_scope="session")
to a test case preventspytest-asyncio
from switching event loops when running the test case. This change ensures that the asyncio event loop used during the test case is the same as the event loop that was present when any session-scoped fixtures were initialized.The primary potential downside of scoping the event loop in this way is that, if a misbehaving test case were to leave orphan event handlers, those event handlers could cause errors in later test cases instead of causing an error in the misbehaving test case. This risk seemed acceptable to me.