Skip to content

Commit

Permalink
Fix client picking embeddings model by default for chat completion
Browse files Browse the repository at this point in the history
Summary:
After we added embeddings, the default model selection in the client may pick embeddings model and return an error. See the example below:

```
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT models list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \
  inference chat-completion \
  --message "hello, what model are you?"
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Failed to inference chat-completion                                                                                                          │
│                                                                                                                                              │
│ Error Type: BadRequestError                                                                                                                  │
│ Details: Error code: 400 - {'detail': "Invalid value: Model 'all-MiniLM-L6-v2' is an embedding model and does not support chat completions"} │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```

Test Plan:
Run manually from the source

```
# Make sure server is started first then run this
python3 -m lib.cli.llama_stack_client models list

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘

# Ok, all-MiniLM-L6-v2 is listed first, now send a request to make sure we do not see the error anymore

python3 -m lib.cli.llama_stack_client inference chat-completion --message "hello, what model are you?"
ChatCompletionResponse(
    completion_message=CompletionMessage(
        content="Hello! I'm an AI assistant, specifically a language model based on the transformer architecture. I was trained on a massive dataset of text from various sources, including
books, articles, and conversations, which enables me to understand and generate human-like language.\n\nMy specific model is a type of transformer-based language model called BERT
(Bidirectional Encoder Representations from Transformers), which is a state-of-the-art model for natural language processing tasks such as question-answering, text classification, and language
translation.\n\nI'm designed to be helpful and informative, so feel free to ask me any questions or have a conversation with me on any topic you'd like!",
        role='assistant',
        stop_reason='end_of_turn',
        tool_calls=[]
    ),
    logprobs=None
)
```
  • Loading branch information
vladimirivic committed Dec 19, 2024
1 parent 0ea2d28 commit 3077093
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/llama_stack_client/lib/cli/inference/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
import click
from rich.console import Console

from ..common.utils import handle_client_errors
from ...inference.event_logger import EventLogger
from ..common.utils import handle_client_errors


@click.group()
Expand All @@ -31,7 +31,7 @@ def chat_completion(ctx, message: str, stream: bool, model_id: Optional[str]):
console = Console()

if not model_id:
available_models = [model.identifier for model in client.models.list()]
available_models = [model.identifier for model in client.models.list() if model.model_type == "llm"]
model_id = available_models[0]

response = client.inference.chat_completion(
Expand Down

0 comments on commit 3077093

Please sign in to comment.