From ab51a508b60fbbd820776abbd0bb190708c305eb Mon Sep 17 00:00:00 2001
From: Justin Lee <justinlee38@outlook.com>
Date: Sat, 14 Dec 2024 08:40:46 +0800
Subject: [PATCH 1/2] made changes to readme and pinning to v0.0.61

---
 docs/zero_to_hero_guide/00_Inference101.ipynb | 12 +---
 docs/zero_to_hero_guide/README.md             | 70 ++++++++++---------
 2 files changed, 37 insertions(+), 45 deletions(-)

diff --git a/docs/zero_to_hero_guide/00_Inference101.ipynb b/docs/zero_to_hero_guide/00_Inference101.ipynb
index 2aced6ef9e..687f5606b1 100644
--- a/docs/zero_to_hero_guide/00_Inference101.ipynb
+++ b/docs/zero_to_hero_guide/00_Inference101.ipynb
@@ -358,7 +358,7 @@
     "    if not stream:\n",
     "        cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
     "    else:\n",
-    "        async for log in EventLogger().log(response):\n",
+    "        for log in EventLogger().log(response):\n",
     "            log.print()\n",
     "\n",
     "# In a Jupyter Notebook cell, use `await` to call the function\n",
@@ -366,16 +366,6 @@
     "# To run it in a python file, use this line instead\n",
     "# asyncio.run(run_main())\n"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "9399aecc",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#fin"
-   ]
   }
  ],
  "metadata": {
diff --git a/docs/zero_to_hero_guide/README.md b/docs/zero_to_hero_guide/README.md
index 68c0121647..05bfbb983b 100644
--- a/docs/zero_to_hero_guide/README.md
+++ b/docs/zero_to_hero_guide/README.md
@@ -45,7 +45,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
 
 ---
 
-## Install Dependencies and Set Up Environment
+## Install Dependencies and Set Up Environmen
 
 1. **Create a Conda Environment**:
    Create a new Conda environment with Python 3.10:
@@ -73,7 +73,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
    Open a new terminal and install `llama-stack`:
    ```bash
    conda activate ollama
-   pip install llama-stack==0.0.55
+   pip install llama-stack==0.0.61
    ```
 
 ---
@@ -96,7 +96,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
 3. **Set the ENV variables by exporting them to the terminal**:
    ```bash
    export OLLAMA_URL="http://localhost:11434"
-   export LLAMA_STACK_PORT=5051
+   export LLAMA_STACK_PORT=5001
    export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
    export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B"
    ```
@@ -104,34 +104,29 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
 3. **Run the Llama Stack**:
    Run the stack with command shared by the API from earlier:
    ```bash
-   llama stack run ollama  \
-      --port $LLAMA_STACK_PORT \
-      --env INFERENCE_MODEL=$INFERENCE_MODEL \
-      --env SAFETY_MODEL=$SAFETY_MODEL \
+   llama stack run ollama
+      --port $LLAMA_STACK_PORT
+      --env INFERENCE_MODEL=$INFERENCE_MODEL
+      --env SAFETY_MODEL=$SAFETY_MODEL
       --env OLLAMA_URL=$OLLAMA_URL
    ```
    Note: Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model.
 
-The server will start and listen on `http://localhost:5051`.
+The server will start and listen on `http://localhost:5001`.
 
 ---
 ## Test with `llama-stack-client` CLI
-After setting up the server, open a new terminal window and install the llama-stack-client package.
+After setting up the server, open a new terminal window and configure the llama-stack-client.
 
-1. Install the llama-stack-client package
+1. Configure the CLI to point to the llama-stack server.
    ```bash
-   conda activate ollama
-   pip install llama-stack-client
-   ```
-2. Configure the CLI to point to the llama-stack server.
-   ```bash
-   llama-stack-client configure --endpoint http://localhost:5051
+   llama-stack-client configure --endpoint http://localhost:5001
    ```
    **Expected Output:**
    ```bash
-   Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5051
+   Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5001
    ```
-3. Test the CLI by running inference:
+2. Test the CLI by running inference:
    ```bash
    llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
    ```
@@ -153,16 +148,18 @@ After setting up the server, open a new terminal window and install the llama-st
 After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`:
 
 ```bash
-curl http://localhost:$LLAMA_STACK_PORT/inference/chat_completion \
--H "Content-Type: application/json" \
--d '{
-    "model": "Llama3.2-3B-Instruct",
+curl http://localhost:$LLAMA_STACK_PORT/alpha/inference/chat-completion
+-H "Content-Type: application/json"
+-d @- <<EOF
+{
+    "model_id": "$INFERENCE_MODEL",
     "messages": [
         {"role": "system", "content": "You are a helpful assistant."},
         {"role": "user", "content": "Write me a 2-sentence poem about the moon"}
     ],
     "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
-}'
+}
+EOF
 ```
 
 You can check the available models with the command `llama-stack-client models list`.
@@ -186,16 +183,12 @@ You can check the available models with the command `llama-stack-client models l
 
 You can also interact with the Llama Stack server using a simple Python script. Below is an example:
 
-### 1. Activate Conda Environment and Install Required Python Packages
-The `llama-stack-client` library offers a robust and efficient python methods for interacting with the Llama Stack server.
+### 1. Activate Conda Environmen
 
 ```bash
 conda activate ollama
-pip install llama-stack-client
 ```
 
-Note, the client library gets installed by default if you install the server library
-
 ### 2. Create Python Script (`test_llama_stack.py`)
 ```bash
 touch test_llama_stack.py
@@ -206,24 +199,33 @@ touch test_llama_stack.py
 In `test_llama_stack.py`, write the following code:
 
 ```python
-from llama_stack_client import LlamaStackClient
+import os
+from llama_stack_client import LlamaStackClien
+
+# Get the model ID from the environment variable
+INFERENCE_MODEL = os.environ.get("INFERENCE_MODEL")
 
-# Initialize the client
-client = LlamaStackClient(base_url="http://localhost:5051")
+# Check if the environment variable is se
+if INFERENCE_MODEL is None:
+    raise ValueError("The environment variable 'INFERENCE_MODEL' is not set.")
 
-# Create a chat completion request
+# Initialize the clien
+client = LlamaStackClient(base_url="http://localhost:5001")
+
+# Create a chat completion reques
 response = client.inference.chat_completion(
     messages=[
         {"role": "system", "content": "You are a friendly assistant."},
         {"role": "user", "content": "Write a two-sentence poem about llama."}
     ],
-    model_id=MODEL_NAME,
+    model_id=INFERENCE_MODEL,
 )
+
 # Print the response
 print(response.completion_message.content)
 ```
 
-### 4. Run the Python Script
+### 4. Run the Python Scrip
 
 ```bash
 python test_llama_stack.py

From 74bed24a689ba77f9405b795b0cb2fd64f92f187 Mon Sep 17 00:00:00 2001
From: Justin Lee <justinlee38@outlook.com>
Date: Sat, 14 Dec 2024 08:55:38 +0800
Subject: [PATCH 2/2] fix typo

---
 docs/zero_to_hero_guide/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/zero_to_hero_guide/README.md b/docs/zero_to_hero_guide/README.md
index 05bfbb983b..b451e0af77 100644
--- a/docs/zero_to_hero_guide/README.md
+++ b/docs/zero_to_hero_guide/README.md
@@ -225,7 +225,7 @@ response = client.inference.chat_completion(
 print(response.completion_message.content)
 ```
 
-### 4. Run the Python Scrip
+### 4. Run the Python Script
 
 ```bash
 python test_llama_stack.py