Fix/correct streaming resource lock #1879

gjpower · 2024-12-23T13:25:25Z

Establishes the exit stack context inside the correct streaming response task so it may be closed correctly and avoids the issue of opening it in one task and trying to close it in a separate task.

Also simplifies usage of the exit_stack by using the AsyncContextManager directly async with contextlib.asynccontextmanager(...)()

does a basic refactor on create_completion and create_chat_completion moving repeated code to a separate function.

Adds a check for client disconnect before call to llm. This means that if a client creates a request but closes the connection before the server gets a lock on the llama_proxy to answer the request then the server will simply close the connection instead of calling the llm to generate the response for an already closed connection

move locking for streaming into get_event_publisher call so it is locked and unlocked in the correct task for the streaming reponse

…eate_completion

llama_cpp/server/app.py

gjpower

Some general clarifications about the scope of this pull request

llama_cpp/server/app.py

fix: correct type hints for body_model

agronholm · 2024-12-24T12:29:55Z

Not your fault @gjpower but the original code looks really weird. An async generator w/o the use of @asynccontextmanager? And manual use of acquire() and release()? What was the original author thinking?

gjpower · 2024-12-24T13:17:43Z

Not your fault @gjpower but the original code looks really weird. An async generator w/o the use of @asynccontextmanager? And manual use of acquire() and release()? What was the original author thinking?

@agronholm It looks like they were making use of the FastApi depends with generator feature that wraps the dependency resource with AsyncContextManager itself.
https://fastapi.tiangolo.com/tutorial/dependencies/dependencies-with-yield/
https://github.com/fastapi/fastapi/blob/0.89.1/fastapi/dependencies/utils.py#L463

If it wasn't for the streaming context the use of depends with a generator and a single lock would be sufficient. The explicit call to wrap with AsyncContextManager is required to allow it to work in the expected way with depends for the other endpoints.
Have a look at this discussion on FastAPI fastapi/fastapi#9054 (comment)

abetlen · 2025-01-08T21:46:24Z

@gjpower thank you, appreciate the fix!

gjpower added 2 commits December 23, 2024 13:06

fix: correct issue with handling lock during streaming

fd2ca45

move locking for streaming into get_event_publisher call so it is locked and unlocked in the correct task for the streaming reponse

fix: simplify exit stack management for create_chat_completion and cr…

6f9cfc3

…eate_completion

This was referenced Dec 23, 2024

Fix: Refactor AsyncExitStack usage to resolve lock handling errors #1862

Closed

fix: replace anyio.Lock with asyncio.Lock to resolve lock handling issues #1871

Closed

fix: correct missing async with and format code

f4fb0ce

agronholm reviewed Dec 23, 2024

View reviewed changes

llama_cpp/server/app.py Show resolved Hide resolved

agronholm reviewed Dec 23, 2024

View reviewed changes

llama_cpp/server/app.py Show resolved Hide resolved

agronholm reviewed Dec 23, 2024

View reviewed changes

llama_cpp/server/app.py Outdated Show resolved Hide resolved

gjpower commented Dec 23, 2024

View reviewed changes

llama_cpp/server/app.py Show resolved Hide resolved

llama_cpp/server/app.py Show resolved Hide resolved

llama_cpp/server/app.py Outdated Show resolved Hide resolved

fix: remove unnecessary explicit use of AsyncExitStack

1ee719a

fix: correct type hints for body_model

Merge branch 'main' into fix/correct_streaming_resource_lock

caf0625

abetlen merged commit e8f14ce into abetlen:main Jan 8, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/correct streaming resource lock #1879

Fix/correct streaming resource lock #1879

gjpower commented Dec 23, 2024 •

edited

Loading

gjpower left a comment

agronholm commented Dec 24, 2024

gjpower commented Dec 24, 2024 •

edited

Loading

abetlen commented Jan 8, 2025

Fix/correct streaming resource lock #1879

Fix/correct streaming resource lock #1879

Conversation

gjpower commented Dec 23, 2024 • edited Loading

gjpower left a comment

Choose a reason for hiding this comment

agronholm commented Dec 24, 2024

gjpower commented Dec 24, 2024 • edited Loading

abetlen commented Jan 8, 2025

gjpower commented Dec 23, 2024 •

edited

Loading

gjpower commented Dec 24, 2024 •

edited

Loading