-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add direct client implementation #15
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh man this is beautiful <3
|
||
from llama_stack.distribution.datatypes import StackRunConfig | ||
from llama_stack.distribution.distribution import get_provider_registry | ||
from llama_stack.distribution.resolver import resolve_impls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add llama-stack
as a dependency for the llama-stack-client
package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope it should be the reverse as we talked about. this code should always be exercised when the person already has llama-stack in their environment (as a library or as pip)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, should this class LlamaStackDirectClient
be inside the llama-stack
repo instead of the llama-stack-client-python
repo?
-
User who want to use llama-stack as a library. Install
llama-stack
package (dependent onllama-stack-client
package). Is able to useLlamaStackDirectClient
. -
User who just installs
llama-stack-client
package. They cannot useLlamaStackDirectClient
without installingllama-stack
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanxi0830 yeah I think that makes sense to me actually.
The Llama Stack primarily operates as a client/server model. However, there are scenarios where hosting a distribution can be cumbersome (e.g., testing, Jupyter), making it more desirable to utilize the Llama Stack as a library.
This introduces a clever hack that extends the Stainless Python client. It intercepts GET/POST requests intended for HTTP transmission and uses reflection to deserialize and route them directly to their implementations.
Is this roundabout serialization the most efficient method? Certainly not. However, the convenience of having this as a drop-in solution is significant, and it is negligible compared to GPU latency.