Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix client picking embeddings model by default for chat completion
Summary: After we added embeddings, the default model selection in the client may pick embeddings model and return an error. See the example below: ``` llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT models list ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ identifier ┃ provider_id ┃ provider_resource_id ┃ metadata ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ all-MiniLM-L6-v2 │ sentence-transformers │ all-MiniLM-L6-v2 │ {'embedding_dimension': 384.0} │ │ meta-llama/Llama-3.2-3B-Instruct │ ollama │ llama3.2:3b-instruct-fp16 │ {} │ └──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘ llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \ inference chat-completion \ --message "hello, what model are you?" ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Failed to inference chat-completion │ │ │ │ Error Type: BadRequestError │ │ Details: Error code: 400 - {'detail': "Invalid value: Model 'all-MiniLM-L6-v2' is an embedding model and does not support chat completions"} │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` Test Plan: Run manually from the source ``` # Make sure server is started first then run this python3 -m lib.cli.llama_stack_client models list ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ identifier ┃ provider_id ┃ provider_resource_id ┃ metadata ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ all-MiniLM-L6-v2 │ sentence-transformers │ all-MiniLM-L6-v2 │ {'embedding_dimension': 384.0} │ │ meta-llama/Llama-3.2-3B-Instruct │ ollama │ llama3.2:3b-instruct-fp16 │ {} │ └──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘ # Ok, all-MiniLM-L6-v2 is listed first, now send a request to make sure we do not see the error anymore python3 -m lib.cli.llama_stack_client inference chat-completion --message "hello, what model are you?" ChatCompletionResponse( completion_message=CompletionMessage( content="Hello! I'm an AI assistant, specifically a language model based on the transformer architecture. I was trained on a massive dataset of text from various sources, including books, articles, and conversations, which enables me to understand and generate human-like language.\n\nMy specific model is a type of transformer-based language model called BERT (Bidirectional Encoder Representations from Transformers), which is a state-of-the-art model for natural language processing tasks such as question-answering, text classification, and language translation.\n\nI'm designed to be helpful and informative, so feel free to ask me any questions or have a conversation with me on any topic you'd like!", role='assistant', stop_reason='end_of_turn', tool_calls=[] ), logprobs=None ) ```
- Loading branch information