-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/correct streaming resource lock #1879
Fix/correct streaming resource lock #1879
Conversation
move locking for streaming into get_event_publisher call so it is locked and unlocked in the correct task for the streaming reponse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general clarifications about the scope of this pull request
fix: correct type hints for body_model
Not your fault @gjpower but the original code looks really weird. An async generator w/o the use of |
@agronholm It looks like they were making use of the FastApi depends with generator feature that wraps the dependency resource with AsyncContextManager itself. If it wasn't for the streaming context the use of depends with a generator and a single lock would be sufficient. The explicit call to wrap with AsyncContextManager is required to allow it to work in the expected way with depends for the other endpoints. |
@gjpower thank you, appreciate the fix! |
Fixes #1861
Establishes the exit stack context inside the correct streaming response task so it may be closed correctly and avoids the issue of opening it in one task and trying to close it in a separate task.
Also simplifies usage of the exit_stack by using the AsyncContextManager directly
async with contextlib.asynccontextmanager(...)()
does a basic refactor on
create_completion
andcreate_chat_completion
moving repeated code to a separate function.Adds a check for client disconnect before call to llm. This means that if a client creates a request but closes the connection before the server gets a lock on the llama_proxy to answer the request then the server will simply close the connection instead of calling the llm to generate the response for an already closed connection