Posts using-llms-in-production

davidhariri · Apr 9, 2024 · addb73b · addb73b
1 parent c2aa865
commit addb73b
Show file tree

Hide file tree

Showing 2 changed files with 101 additions and 0 deletions.
diff --git a/posts/using-llms-in-production.md b/posts/using-llms-in-production.md
@@ -0,0 +1,96 @@
+---
+title: Using LLMs in Production
+description: A nod to Will Larson's post on using LLMs in production and some additional notes based on my own experience.
+date: 2024-04-09 11:45:00-0400
+tags:
+- LLMs
+- ML
+- code
+---
+
+[Will Larson](https://lethain.com) just wrote about his mental models for [using LLMs in production](https://lethain.com/mental-model-for-how-to-use-llms-in-products/). I agree with much of it, particularly the re-framing of what LLMs can _really do today_ for product developers.
+
+## On the unsupervised (no human in the loop) scenario
+
+> Because you cannot rely on LLMs to provide correct responses, and you cannot generate a confidence score for any given response, you have to either accept potential inaccuracies (which makes sense in many cases, humans are wrong sometimes too) or keep a Human-in-the-Loop (HITL) to validate the response.
+
+I only wish the post touched more on the unsupervised (no human in the loop) scenario. For many workflows, an LLM and human in the loop means the workflow is only marginally improved. To make systems that are autonomous it's not just about accepting potential inaccuracies, it's also about accepting responsibility for _driving them down_. This is the super hard part about unsupervised LLM application. You have to first educate customers on the trade-offs and risks they are taking and then you have to build systems that drive those risks to 0 and optimize those trade offs for value so that customers become increasingly confident in the system.
+
+## Using schemas in prompts
+
+A tactic that wasn't mentioned in the post is using [`JSONSchema`](https://json-schema.org/) within LLM prompts. This is a great way to ensure generations are more accurate and meet your systems expectations.
+
+<blockquote class="callout note">
+    You don't need to use JSONSchema if you don't want to. We have had good results from simply showing a few examples of desired output in the prompt and letting the LLM infer the schema from that.
+</blockquote>
+
+Here's a toy example of how you can use `JSONSchema` with an LLM prompt:
+
+```py
+import openai
+from pydantic import BaseModel
+from typing import Literal
+
+# Example docs
+docs = [
+    {
+        'id': 1,
+        'content': 'This is a well formed sentence that has no errors in grammar or spelling.'
+    },
+    {
+        'id': 2,
+        'content': 'This is an exampel sentence errors in grammer and speling.'
+    }
+]
+
+# Define your schema with the LLM. We're using pydantic, but there are many options for this.
+class DocumentReview(BaseModel):
+    # A document review
+    document_id: int
+    review: Literal['good', 'bad']
+
+# Make a prompt that uses the schema
+prompt = f"""Given the following <document>, please review the document and provide your review using the provided JSONSchema:
+
+<document id="{doc['id']}">
+{doc['content']}
+</document>
+
+JSONSchema:
+{schema}
+
+Your review:
+"""
+
+# Assuming openai API key is set in environment variables
+openai.api_key = os.getenv("OPENAI_API_KEY")
+
+response = openai.Completion.create(
+  engine="text-davinci-003",
+  prompt=prompt.format(
+    doc=docs[0],
+    schema=DocumentReview.model_json_schema()
+  ),
+)
+
+# Assuming the LLM returns a JSON string that fits our schema
+try:
+    review = DocumentReview.model_validate_json(response.choices[0].text.strip())
+except ValidationError as e:
+    print(f"Error validating schema: {e}")
+    return
+
+# Check that the document ID matches the document ID in the docs
+print(f"Document ID: {review.document_id}, Review: {review.review}")
+```
+
+**Handling `ValidationError`**
+
+1. You can handle the `ValidationError` by re-prompting the LLM with the error and re-running the prompt.
+2. You can also handle the `ValidationError` by dropping down to a HITL using a queue of documents to review.
+
+---
+
+Using schemas to validate generations allows you to ensure the data generated by an LLM at least matches your data types. In addition, if the LLM is referencing passed material (such as in a RAG architecture) you can ensure the document IDs referenced in generations at least match the source documents given. To improve this further, you can perform some semantic/string distance checks to the source documents' content and the outputted generation.
+
+For more on using `JSONSchema` with LLM prompts, see [this post](https://thoughtbot.com/blog/get-consistent-data-from-your-llm-with-json-schema) from ThoughtBot.
diff --git a/static/styles.css b/static/styles.css
@@ -89,6 +89,10 @@ time {
   color: var(--description-color);
 }
 
+hr {
+  margin: 2rem 0;
+}
+
 footer {
   display: block;
   align-self: bottom;
@@ -125,6 +129,7 @@ blockquote {
     padding: 0 1rem;
     color: var(--description-color);
     border-left: 0.25rem solid var(--description-color);
+    margin-bottom: 1rem;
 }
 
 blockquote a,