Fix client picking embeddings model by default for chat completion · meta-llama/llama-stack-client-python@3077093

Commit

Fix client picking embeddings model by default for chat completion

Summary:
After we added embeddings, the default model selection in the client may pick embeddings model and return an error. See the example below:

```
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT models list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \
  inference chat-completion \
  --message "hello, what model are you?"
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Failed to inference chat-completion                                                                                                          │
│                                                                                                                                              │
│ Error Type: BadRequestError                                                                                                                  │
│ Details: Error code: 400 - {'detail': "Invalid value: Model 'all-MiniLM-L6-v2' is an embedding model and does not support chat completions"} │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```

Test Plan:
Run manually from the source

```
# Make sure server is started first then run this
python3 -m lib.cli.llama_stack_client models list

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘

# Ok, all-MiniLM-L6-v2 is listed first, now send a request to make sure we do not see the error anymore

python3 -m lib.cli.llama_stack_client inference chat-completion --message "hello, what model are you?"
ChatCompletionResponse(
    completion_message=CompletionMessage(
        content="Hello! I'm an AI assistant, specifically a language model based on the transformer architecture. I was trained on a massive dataset of text from various sources, including
books, articles, and conversations, which enables me to understand and generate human-like language.\n\nMy specific model is a type of transformer-based language model called BERT
(Bidirectional Encoder Representations from Transformers), which is a state-of-the-art model for natural language processing tasks such as question-answering, text classification, and language
translation.\n\nI'm designed to be helpful and informative, so feel free to ask me any questions or have a conversation with me on any topic you'd like!",
        role='assistant',
        stop_reason='end_of_turn',
        tool_calls=[]
    ),
    logprobs=None
)
```

Loading branch information

vladimirivic committed Dec 19, 2024

1 parent 0ea2d28 commit 3077093

src/llama_stack_client/lib/cli/inference/inference.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -9,8 +9,8 @@
  
    import click

    from rich.console import Console

    from ..common.utils import handle_client_errors

    from ...inference.event_logger import EventLogger

    from ..common.utils import handle_client_errors

    @click.group()

    @@ -31,7 +31,7 @@ def chat_completion(ctx, message: str, stream: bool, model_id: Optional[str]):
  
        console = Console()

        if not model_id:

            available_models = [model.identifier for model in client.models.list()]

            available_models = [model.identifier for model in client.models.list() if model.model_type == "llm"]

            model_id = available_models[0]

        response = client.inference.chat_completion(

0 comments on commit `3077093`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `3077093`

Commit

There are no files selected for viewing

0 comments on commit 3077093

0 comments on commit `3077093`