Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made changes to readme and pinning to llamastack v0.0.61 #624

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 1 addition & 11 deletions docs/zero_to_hero_guide/00_Inference101.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -358,24 +358,14 @@
" if not stream:\n",
" cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
" else:\n",
" async for log in EventLogger().log(response):\n",
" for log in EventLogger().log(response):\n",
" log.print()\n",
"\n",
"# In a Jupyter Notebook cell, use `await` to call the function\n",
"await run_main()\n",
"# To run it in a python file, use this line instead\n",
"# asyncio.run(run_main())\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "9399aecc",
"metadata": {},
"outputs": [],
"source": [
"#fin"
]
}
],
"metadata": {
Expand Down
68 changes: 35 additions & 33 deletions docs/zero_to_hero_guide/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next

---

## Install Dependencies and Set Up Environment
## Install Dependencies and Set Up Environmen

1. **Create a Conda Environment**:
Create a new Conda environment with Python 3.10:
Expand Down Expand Up @@ -73,7 +73,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
Open a new terminal and install `llama-stack`:
```bash
conda activate ollama
pip install llama-stack==0.0.55
pip install llama-stack==0.0.61
```

---
Expand All @@ -96,42 +96,37 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
3. **Set the ENV variables by exporting them to the terminal**:
```bash
export OLLAMA_URL="http://localhost:11434"
export LLAMA_STACK_PORT=5051
export LLAMA_STACK_PORT=5001
export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B"
```

3. **Run the Llama Stack**:
Run the stack with command shared by the API from earlier:
```bash
llama stack run ollama \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env SAFETY_MODEL=$SAFETY_MODEL \
llama stack run ollama
--port $LLAMA_STACK_PORT
--env INFERENCE_MODEL=$INFERENCE_MODEL
--env SAFETY_MODEL=$SAFETY_MODEL
--env OLLAMA_URL=$OLLAMA_URL
```
Note: Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model.

The server will start and listen on `http://localhost:5051`.
The server will start and listen on `http://localhost:5001`.

---
## Test with `llama-stack-client` CLI
After setting up the server, open a new terminal window and install the llama-stack-client package.
After setting up the server, open a new terminal window and configure the llama-stack-client.

1. Install the llama-stack-client package
1. Configure the CLI to point to the llama-stack server.
```bash
conda activate ollama
pip install llama-stack-client
```
2. Configure the CLI to point to the llama-stack server.
```bash
llama-stack-client configure --endpoint http://localhost:5051
llama-stack-client configure --endpoint http://localhost:5001
```
**Expected Output:**
```bash
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5051
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5001
```
3. Test the CLI by running inference:
2. Test the CLI by running inference:
```bash
llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
```
Expand All @@ -153,16 +148,18 @@ After setting up the server, open a new terminal window and install the llama-st
After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`:

```bash
curl http://localhost:$LLAMA_STACK_PORT/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
"model": "Llama3.2-3B-Instruct",
curl http://localhost:$LLAMA_STACK_PORT/alpha/inference/chat-completion
-H "Content-Type: application/json"
-d @- <<EOF
{
"model_id": "$INFERENCE_MODEL",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write me a 2-sentence poem about the moon"}
],
"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}'
}
EOF
```

You can check the available models with the command `llama-stack-client models list`.
Expand All @@ -186,16 +183,12 @@ You can check the available models with the command `llama-stack-client models l

You can also interact with the Llama Stack server using a simple Python script. Below is an example:

### 1. Activate Conda Environment and Install Required Python Packages
The `llama-stack-client` library offers a robust and efficient python methods for interacting with the Llama Stack server.
### 1. Activate Conda Environmen

```bash
conda activate ollama
pip install llama-stack-client
```

Note, the client library gets installed by default if you install the server library

### 2. Create Python Script (`test_llama_stack.py`)
```bash
touch test_llama_stack.py
Expand All @@ -206,19 +199,28 @@ touch test_llama_stack.py
In `test_llama_stack.py`, write the following code:

```python
from llama_stack_client import LlamaStackClient
import os
from llama_stack_client import LlamaStackClien

# Get the model ID from the environment variable
INFERENCE_MODEL = os.environ.get("INFERENCE_MODEL")

# Initialize the client
client = LlamaStackClient(base_url="http://localhost:5051")
# Check if the environment variable is se
if INFERENCE_MODEL is None:
raise ValueError("The environment variable 'INFERENCE_MODEL' is not set.")

# Create a chat completion request
# Initialize the clien
client = LlamaStackClient(base_url="http://localhost:5001")

# Create a chat completion reques
response = client.inference.chat_completion(
messages=[
{"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Write a two-sentence poem about llama."}
],
model_id=MODEL_NAME,
model_id=INFERENCE_MODEL,
)

# Print the response
print(response.completion_message.content)
```
Expand Down
Loading