diff --git a/docs/zero_to_hero_guide/00_Inference101.ipynb b/docs/zero_to_hero_guide/00_Inference101.ipynb index 2aced6ef9e..687f5606b1 100644 --- a/docs/zero_to_hero_guide/00_Inference101.ipynb +++ b/docs/zero_to_hero_guide/00_Inference101.ipynb @@ -358,7 +358,7 @@ " if not stream:\n", " cprint(f'> Response: {response.completion_message.content}', 'cyan')\n", " else:\n", - " async for log in EventLogger().log(response):\n", + " for log in EventLogger().log(response):\n", " log.print()\n", "\n", "# In a Jupyter Notebook cell, use `await` to call the function\n", @@ -366,16 +366,6 @@ "# To run it in a python file, use this line instead\n", "# asyncio.run(run_main())\n" ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "9399aecc", - "metadata": {}, - "outputs": [], - "source": [ - "#fin" - ] } ], "metadata": { diff --git a/docs/zero_to_hero_guide/README.md b/docs/zero_to_hero_guide/README.md index 68c0121647..b451e0af77 100644 --- a/docs/zero_to_hero_guide/README.md +++ b/docs/zero_to_hero_guide/README.md @@ -45,7 +45,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next --- -## Install Dependencies and Set Up Environment +## Install Dependencies and Set Up Environmen 1. **Create a Conda Environment**: Create a new Conda environment with Python 3.10: @@ -73,7 +73,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next Open a new terminal and install `llama-stack`: ```bash conda activate ollama - pip install llama-stack==0.0.55 + pip install llama-stack==0.0.61 ``` --- @@ -96,7 +96,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next 3. **Set the ENV variables by exporting them to the terminal**: ```bash export OLLAMA_URL="http://localhost:11434" - export LLAMA_STACK_PORT=5051 + export LLAMA_STACK_PORT=5001 export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B" ``` @@ -104,34 +104,29 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next 3. **Run the Llama Stack**: Run the stack with command shared by the API from earlier: ```bash - llama stack run ollama \ - --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=$INFERENCE_MODEL \ - --env SAFETY_MODEL=$SAFETY_MODEL \ + llama stack run ollama + --port $LLAMA_STACK_PORT + --env INFERENCE_MODEL=$INFERENCE_MODEL + --env SAFETY_MODEL=$SAFETY_MODEL --env OLLAMA_URL=$OLLAMA_URL ``` Note: Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model. -The server will start and listen on `http://localhost:5051`. +The server will start and listen on `http://localhost:5001`. --- ## Test with `llama-stack-client` CLI -After setting up the server, open a new terminal window and install the llama-stack-client package. +After setting up the server, open a new terminal window and configure the llama-stack-client. -1. Install the llama-stack-client package +1. Configure the CLI to point to the llama-stack server. ```bash - conda activate ollama - pip install llama-stack-client - ``` -2. Configure the CLI to point to the llama-stack server. - ```bash - llama-stack-client configure --endpoint http://localhost:5051 + llama-stack-client configure --endpoint http://localhost:5001 ``` **Expected Output:** ```bash - Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5051 + Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5001 ``` -3. Test the CLI by running inference: +2. Test the CLI by running inference: ```bash llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon" ``` @@ -153,16 +148,18 @@ After setting up the server, open a new terminal window and install the llama-st After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`: ```bash -curl http://localhost:$LLAMA_STACK_PORT/inference/chat_completion \ --H "Content-Type: application/json" \ --d '{ - "model": "Llama3.2-3B-Instruct", +curl http://localhost:$LLAMA_STACK_PORT/alpha/inference/chat-completion +-H "Content-Type: application/json" +-d @- <