diff --git a/bifrost/app/blog/blogs/replaying-llm-sessions/metadata.json b/bifrost/app/blog/blogs/replaying-llm-sessions/metadata.json new file mode 100644 index 000000000..c14b5850b --- /dev/null +++ b/bifrost/app/blog/blogs/replaying-llm-sessions/metadata.json @@ -0,0 +1,10 @@ +{ + "title": "Replaying LLM Sessions for Iterative AI Agent Improvement", + "title1": "Replaying LLM Sessions for Iterative AI Agent Improvement", + "title2": "Replaying LLM Sessions for Iterative AI Agent Improvement", + "description": "Learn how to enhance your AI agents by replaying and modifying LLM sessions using Helicone. Apply changes directly to real user interactions to gain authentic context, reveal hidden effects, and accelerate iteration.", + "images": "/static/blog/replaying-llm-sessions/sessions.webp", + "time": "15 minute read", + "author": "Cole Gottdank", + "date": "September 26, 2024" +} diff --git a/bifrost/app/blog/blogs/replaying-llm-sessions/src.mdx b/bifrost/app/blog/blogs/replaying-llm-sessions/src.mdx new file mode 100644 index 000000000..91268a779 --- /dev/null +++ b/bifrost/app/blog/blogs/replaying-llm-sessions/src.mdx @@ -0,0 +1,309 @@ +![sessions](/static/blog/replaying-llm-sessions/sessions.webp) +Experimenting with prompts in isolation **limits your understanding**. To truly grasp how a prompt change impacts an entire session, you need to **apply changes directly to real user interactions**. **Replaying LLM sessions with Helicone unlocks this capability**, providing insights unattainable through isolated testing. + +**Why is this powerful?** + +- **Authentic Context**: By leveraging actual production data, you see how changes affect real user experiences. +- **Unveiling Hidden Effects**: Discover unintended consequences that only emerge over full sessions. +- **Accelerated Iteration**: Automate testing with real inputs, streamlining your optimization process. + +**Helicone empowers you to replay any complex session**—a capability no other platform offers. Due to our adaptability, more mature product teams often build bespoke solutions atop Helicone to store, aggregate, and analyze their AI workflows, enhancing performance with genuine user data without reinventing the wheel. + +In this guide, we'll **demonstrate how to leverage Helicone to replay LLM sessions**. You'll learn how to set up an initial session, query session data, and replay sessions with modifications. We'll also share tips on customizing this approach for your unique needs. + +--- + +## Overview of the Replay Process With Helicone + +The process of replaying LLM sessions with Helicone involves three main steps: + +1. **Setting Up the Initial Session**: Instrument your LLM calls to include Helicone session metadata so that they can be tracked and logged. +2. **Querying Helicone for Session Data**: Use Helicone's API to retrieve the logs of past sessions that you want to replay. +3. **Replaying the Session with Modifications**: Programmatically modify the retrieved session data as needed and send requests to the LLM to observe the effects. + +Let's explore each of these steps in detail by following an example. + +## Example: AI Debate Application + +We'll walk through an example of a debate session between a user and an assistant. Between each argument, a impartial assistant scores the argument from 1 to 10. + +### Step 1: Setting Up the Initial Session + +Before you can replay sessions, you need to log them properly in Helicone. By adding **only 3 headers** to your LLM API requests, you can tag and group them into sessions. + +#### Instrumenting Your LLM Calls + +1. **Configure the OpenAI API client with Helicone** + + Set up the session headers, configure the OpenAI API client with Helicone, and include the necessary headers when making a request. + + ```javascript + const { OpenAI } = require("openai"); + const { randomUUID } = require("crypto"); + + // Generate unique session identifiers + const sessionId = randomUUID(); + const sessionName = "AI Debate"; + const sessionPath = "/debate/climate-change"; + + // Initialize OpenAI client with Helicone baseURL and auth header + const openai = new OpenAI({ + apiKey: process.env.OPENAI_API_KEY, + baseURL: "https://oai.helicone.ai/v1", + defaultHeaders: { + "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`, + }, + }); + + // Include Helicone session headers when making requests + const response = await openai.chat.completions( + { + model: "gpt-4o-mini", + messages: conversation, + }, + { + headers: { + "Helicone-Session-Id": sessionId, + "Helicone-Session-Name": sessionName, + "Helicone-Session-Path": sessionPath, + }, + } + ); + ``` + +2. **Initialize the conversation with the assistant** + + ```javascript + const topic = "The impact of climate change on global economies"; + + const conversation = [ + { + role: "system", + content: + "You're a debating professional. You're engaging in a structured debate with the user. Each of you will present arguments for or against the topic. Keep responses concise and to the point.", + }, + { + role: "assistant", + content: `Welcome to our debate! Today's topic is: "${topic}". I will argue in favor, and you will argue against. Please present your opening argument.`, + }, + ]; + ``` + +3. **Loop through the debate turns** + + ```javascript + let turn = 1; + + while (turn <= MAX_TURNS) { + // Get user's argument + const userArgument = await promptUser("Your argument: "); + conversation.push({ role: "user", content: userArgument }); + + // Score the user's argument + await evaluateArgument( + userArgument, + "Your Argument", + sessionId, + sessionName, + sessionPath + ); + + // Assistant responds with a counter-argument + const assistantResponse = await generateAssistantResponse( + conversation, + sessionId, + sessionName, + sessionPath + ); + conversation.push(assistantResponse); + + // Score the assistant's argument + await evaluateArgument( + assistantResponse.content, + "Assistant's Argument", + sessionId, + sessionName, + sessionPath + ); + + turn++; + } + ``` + + **Note:** The functions `promptUser`, `evaluateArgument`, and `generateAssistantResponse` handle user input, argument evaluation, and generating assistant responses, respectively. + +**After setting up and running your session through Helicone, you can view it in Helicone:** + +_Go fullscreen for the best experience._ + + + +_Read more about how to implement Helicone sessions [here](https://docs.helicone.ai/features/sessions)._ + +### Step 2: Querying the Session Data from Helicone + +```javascript +const response = await fetch("https://api.helicone.ai/v1/request/query", { + method: "POST", + headers: { + "Content-Type": "application/json", + Authorization: `Bearer ${HELICONE_API_KEY}`, + }, + body: JSON.stringify({ + filter: { + properties: { + "Helicone-Session-Id": { + equals: SESSION_ID_TO_REPLAY, + }, + }, + }, + }), +}); +const data = await response.json(); +``` + +Read more about Helicone's API [here](https://docs.helicone.ai/rest/request/post-v1requestquery). + +### Step 3: Processing and Modifying the Session Data + +Now that you have the session data, you'll need to process it. + +1. **Parse and sort the requests** + + **Sorting session data can be complex** because + each use case is unique and may require custom logic. For our debate session example, + we simply sort the requests by their `created_at` timestamp. + + ```javascript + const requests = data.data.map((request) => ({ + created_at: request.request_created_at, + session: request.request_properties["Helicone-Session-Id"], + signed_body_url: request.signed_body_url, + request_path: request.request_path, + path: request.request_properties["Helicone-Session-Path"], + prompt_id: request.request_properties["Helicone-Prompt-Id"], + body: request.body, + })); + requests.sort((a, b) => new Date(a.created_at) - new Date(b.created_at)); + ``` + +2. **Modify the request bodies as needed** + + For example, we can adjust the system prompts to change the assistants argument or argument evaluation response. + + ```javascript + function modifyRequestBody(request) { + if (request.prompt_id === "argument-evaluation") { + const systemMessage = request.body.messages.find( + (msg) => msg.role === "system" + ); + if (systemMessage) { + systemMessage.content += " Keep the feedback short and concise."; + } + } else if (request.prompt_id === "assistant-argument") { + const systemMessage = request.body.messages.find( + (msg) => msg.role === "system" + ); + if (systemMessage) { + systemMessage.content += + " Take the persona of a genius in this field when responding."; + } + } + return request; + } + ``` + +3. **Replay the modified session** + + ```javascript + // Create a new session for the replay + const replaySessionId = randomUUID(); + for (const request of requests) { + const modifiedRequest = modifyRequestBody(request); + + // Reuse session metadata from the original request + await handleChatCompletion(modifiedRequest); + } + + async function handleChatCompletion(modifiedRequest) { + const { body, path, prompt_id, request_path } = modifiedRequest; + + // Send the modified request to the LLM + const response = await fetch(request_path, { + method: "POST", + headers: { + Authorization: `Bearer ${OPENAI_API_KEY}`, + "Helicone-Auth": `Bearer ${HELICONE_API_KEY}`, + // Reuse the session metadata for logging + "Helicone-Session-Id": replaySessionId, + "Helicone-Session-Name": sessionName, + "Helicone-Session-Path": path, + "Helicone-Prompt-Id": prompt_id, + }, + body: JSON.stringify(body), + }); + } + ``` + + **Note:** In the `handleChatCompletion` function, we send the modified request to the LLM. By reusing the same `session-name`, `session-path`, `prompt-id`, and `request path` from the original requests, we ensure that the replayed session is logged in Helicone under the same session metadata. This allows you to see the replayed requests in Helicone, grouped under the same session, making it easier to compare and analyze the effects of your modifications. + +### After running the replay, you can view it in Helicone: + +_Go fullscreen for the best experience._ + + + +With the replayed session now visible in Helicone, you can observe how the modifications impact the AI's responses throughout the session. This visualization shows how changes in prompts or configurations affect subsequent interactions. + +--- + +## Optional: Evaluations & Prompt Versioning + +### Evaluations + +To assess the impact of your changes quantitatively, use Helicone's [evaluation features](https://docs.helicone.ai/features/evaluation) to assign scores to both the original and replayed sessions. **Comparing these scores helps you understand the effects of your modifications** and refine your prompts more effectively. + +### Prompt Versioning + +Helicone's [prompt versioning](https://docs.helicone.ai/features/prompts) feature allows you to manage and compare different versions of your prompts effectively. By maintaining multiple versions, **you can test various combinations within your sessions to identify which prompts yield the best results**. To retrieve specific prompt versions, use Helicone's [Prompt API](https://docs.helicone.ai/rest/prompt/post-v1prompt-query). + +Here's how you can iterate over prompt versions to test different compositions in your sessions: + +```pseudo +Define a list of prompt versions for each prompt. +For each combination in the cartesian product of prompt versions: + Generate a new session ID. + For each prompt in the combination: + Fetch the prompt content using Helicone's Prompt API. + Run the session using the retrieved prompts. +``` + +_Alternatively, as described above, you can manually modify the prompts after retrieving the session, and Helicone will automatically log the new prompt versions for you. In this approach, you create the prompt versions first._ + +### Conclusion + +By replaying and modifying LLM sessions with Helicone, you gain deeper insights into how changes affect the entire workflow. This method provides context-rich, real-world data that leads to more effective optimizations and a comprehensive understanding of your AI's behavior. + +--- + +### Helicone Plug 🧊 + +**Helicone is the ultimate open-source LLM observability platform built for developers to monitor, debug and improve their LLM applications**. We're fully open source and built to handle high usage use-cases seamlessly. With over **1.8 billion requests** and **1.6 trillion tokens** logged, it's proven at scale. **Check out our open-source GitHub repository [here](https://github.com/Helicone/helicone).** + +**Trusted by Industry Leaders**: Companies like **Sunrun**, **Michelin**, **QAWolf**, **Slate Magazine**, and many more rely on Helicone to enhance their AI models and workflows. + +You can view the full code used in this guide on [GitHub](https://github.com/colegottdank/helicone-replay-session-blog). + +--- diff --git a/bifrost/app/blog/page.tsx b/bifrost/app/blog/page.tsx index 69884e792..55cec1070 100644 --- a/bifrost/app/blog/page.tsx +++ b/bifrost/app/blog/page.tsx @@ -190,6 +190,11 @@ export type BlogStructure = }; const blogContent: BlogStructure[] = [ + { + dynmaicEntry: { + folderName: "replaying-llm-sessions", + }, + }, { dynmaicEntry: { folderName: "helicone-recap", diff --git a/bifrost/public/static/blog/replaying-llm-sessions/sessions.webp b/bifrost/public/static/blog/replaying-llm-sessions/sessions.webp new file mode 100644 index 000000000..39fad2321 Binary files /dev/null and b/bifrost/public/static/blog/replaying-llm-sessions/sessions.webp differ