Litellm dev 2024 12 19 p3 (BerriAI#7322)

* fix(utils.py): remove unsupported optional params (if drop_params=True) before passing into map openai params Fixes BerriAI#7242 * test: new test for langfuse prompt management hook Addresses BerriAI#3893 (comment) * feat(main.py): add 'get_chat_completion_prompt' customlogger hook allows for langfuse prompt management Addresses BerriAI#3893 (comment) * feat(langfuse_prompt_management.py): working e2e langfuse prompt management works with `langfuse/` route * feat(main.py): initial tracing for dynamic langfuse params allows admin to specify langfuse keys by model in model_list * feat(main.py): support passing langfuse credentials dynamically * fix(langfuse_prompt_management.py): create langfuse client based on dynamic callback params allows dynamic langfuse params to work * fix: fix linting errors * docs(prompt_management.md): refactor docs for sdk + proxy prompt management tutorial * docs(prompt_management.md): cleanup doc * docs: cleanup topnav * docs(prompt_management.md): update docs to be easier to use * fix: remove unused imports * docs(prompt_management.md): add architectural overview doc * fix(litellm_logging.py): fix dynamic param passing * fix(langfuse_prompt_management.py): fix linting errors * fix: fix linting errors * fix: use typing_extensions for typealias to ensure python3.8 compatibility * test: use stream_options in test to account for tiktoken diff * fix: improve import error message, and check run test earlier
kp-forks · Dec 20, 2024 · 27a4d08 · 27a4d08
1 parent 2c36f25
commit 27a4d08
Show file tree

Hide file tree

Showing 17 changed files with 631 additions and 243 deletions.
diff --git a/docs/my-website/docs/proxy/prompt_management.md b/docs/my-website/docs/proxy/prompt_management.md
@@ -1,83 +1,212 @@
 import Image from '@theme/IdealImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
 
 # Prompt Management
 
-LiteLLM supports using [Langfuse](https://langfuse.com/docs/prompts/get-started) for prompt management on the proxy.
+Run experiments or change the specific model (e.g. from gpt-4o to gpt4o-mini finetune) from your prompt management tool (e.g. Langfuse) instead of making changes in the application. 
+
+Supported Integrations:
+- [Langfuse](https://langfuse.com/docs/prompts/get-started)
 
 ## Quick Start
 
-1. Add Langfuse as a 'callback' in your config.yaml
+
+<Tabs>
+
+<TabItem value="sdk" label="SDK">
+
+```python
+import os 
+import litellm
+
+os.environ["LANGFUSE_PUBLIC_KEY"] = "public_key" # [OPTIONAL] set here or in `.completion`
+os.environ["LANGFUSE_SECRET_KEY"] = "secret_key" # [OPTIONAL] set here or in `.completion`
+
+litellm.set_verbose = True # see raw request to provider
+
+resp = litellm.completion(
+    model="langfuse/gpt-3.5-turbo",
+    prompt_id="test-chat-prompt",
+    prompt_variables={"user_message": "this is used"}, # [OPTIONAL]
+    messages=[{"role": "user", "content": "<IGNORED>"}],
+)
+```
+
+
+
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+1. Setup config.yaml
 
 ```yaml
 model_list:
   - model_name: gpt-3.5-turbo
     litellm_params:
-      model: azure/chatgpt-v-2
-      api_key: os.environ/AZURE_API_KEY
-      api_base: os.environ/AZURE_API_BASE
-
-litellm_settings:
-    callbacks: ["langfuse"] # 👈 KEY CHANGE
+      model: langfuse/gpt-3.5-turbo
+      prompt_id: "<langfuse_prompt_id>"
+      api_key: os.environ/OPENAI_API_KEY
 ```
 
 2. Start the proxy
 
 ```bash
-litellm-proxy --config config.yaml
+litellm --config config.yaml --detailed_debug
 ```
 
 3. Test it! 
 
+<Tabs>
+<TabItem value="curl" label="CURL">
+
 ```bash
 curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
 -H 'Content-Type: application/json' \
 -H 'Authorization: Bearer sk-1234' \
 -d '{
-    "model": "gpt-4",
+    "model": "gpt-3.5-turbo",
     "messages": [
         {
             "role": "user",
             "content": "THIS WILL BE IGNORED"
         }
     ],
-    "metadata": {
-        "langfuse_prompt_id": "value",
-        "langfuse_prompt_variables": { # [OPTIONAL]
-            "key": "value"
-        }
+    "prompt_variables": {
+        "key": "this is used"
     }
 }'
 ```
+</TabItem>
+<TabItem value="OpenAI Python SDK" label="OpenAI Python SDK">
+
+```python
+import openai
+client = openai.OpenAI(
+    api_key="anything",
+    base_url="http://0.0.0.0:4000"
+)
+
+# request sent to model set on litellm proxy, `litellm --model`
+response = client.chat.completions.create(
+    model="gpt-3.5-turbo",
+    messages = [
+        {
+            "role": "user",
+            "content": "this is a test request, write a short poem"
+        }
+    ],
+    extra_body={
+        "prompt_variables": { # [OPTIONAL]
+            "key": "this is used"
+        }
+    }
+)
+
+print(response)
+```
+
+</TabItem>
+</Tabs>
+
+</TabItem>
+</Tabs>
+
+
+**Expected Logs:**
+
+```
+POST Request Sent from LiteLLM:
+curl -X POST \
+https://api.openai.com/v1/ \
+-d '{'model': 'gpt-3.5-turbo', 'messages': <YOUR LANGFUSE PROMPT TEMPLATE>}'
+```
+
+## How to set model 
+
+### Set the model on LiteLLM 
 
-## What is 'langfuse_prompt_id'?
+You can do `langfuse/<litellm_model_name>`
 
-- `langfuse_prompt_id`: The ID of the prompt that will be used for the request.
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+litellm.completion(
+    model="langfuse/gpt-3.5-turbo", # or `langfuse/anthropic/claude-3-5-sonnet`
+    ...
+)
+```
+
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+```yaml
+model_list:
+  - model_name: gpt-3.5-turbo
+    litellm_params:
+      model: langfuse/gpt-3.5-turbo # OR langfuse/anthropic/claude-3-5-sonnet
+      prompt_id: <langfuse_prompt_id>
+      api_key: os.environ/OPENAI_API_KEY
+```
+
+</TabItem>
+</Tabs>
+
+### Set the model in Langfuse
+
+If the model is specified in the Langfuse config, it will be used.
+
+<Image img={require('../../img/langfuse_prompt_management_model_config.png')} />
+
+```yaml
+model_list:
+  - model_name: gpt-3.5-turbo
+    litellm_params:
+      model: azure/chatgpt-v-2
+      api_key: os.environ/AZURE_API_KEY
+      api_base: os.environ/AZURE_API_BASE
+```
+
+## What is 'prompt_variables'?
+
+- `prompt_variables`: A dictionary of variables that will be used to replace parts of the prompt.
+
+
+
+## What is 'prompt_id'?
+
+- `prompt_id`: The ID of the prompt that will be used for the request.
 
 <Image img={require('../../img/langfuse_prompt_id.png')} />
 
 ## What will the formatted prompt look like?
 
 ### `/chat/completions` messages
 
-The message will be added to the start of the prompt.
+The `messages` field sent in by the client is ignored. 
 
-- if the Langfuse prompt is a list, it will be added to the start of the messages list (assuming it's an OpenAI compatible message).
+The Langfuse prompt will replace the `messages` field.
 
-- if the Langfuse prompt is a string, it will be added as a system message.
+To replace parts of the prompt, use the `prompt_variables` field. [See how prompt variables are used](https://github.com/BerriAI/litellm/blob/017f83d038f85f93202a083cf334de3544a3af01/litellm/integrations/langfuse/langfuse_prompt_management.py#L127)
 
-```python
-if isinstance(compiled_prompt, list):
-    data["messages"] = compiled_prompt + data["messages"]
-else:
-    data["messages"] = [
-        {"role": "system", "content": compiled_prompt}
-    ] + data["messages"]
-```
+If the Langfuse prompt is a string, it will be sent as a user message (not all providers support system messages).
 
-### `/completions` messages
+If the Langfuse prompt is a list, it will be sent as is (Langfuse chat prompts are OpenAI compatible).
 
-The message will be added to the start of the prompt.
+## Architectural Overview
 
-```python
-data["prompt"] = compiled_prompt + "\n" + data["prompt"]
-```
+<Image img={require('../../img/prompt_management_architecture_doc.png')} />
+
+## API Reference
+
+These are the params you can pass to the `litellm.completion` function in SDK and `litellm_params` in config.yaml
+
+```
+prompt_id: str # required
+prompt_variables: Optional[dict] # optional
+langfuse_public_key: Optional[str] # optional
+langfuse_secret: Optional[str] # optional
+langfuse_secret_key: Optional[str] # optional
+langfuse_host: Optional[str] # optional
+```
diff --git a/docs/my-website/docusaurus.config.js b/docs/my-website/docusaurus.config.js
@@ -130,15 +130,7 @@ const config = {
             href: 'https://discord.com/invite/wuPM9dRgDw',
             label: 'Discord',
             position: 'right',
-          },
-          {
-            type: 'html',
-            position: 'right',
-            value:
-              `<a href=# class=navbar__link data-fr-widget>
-                I'm Confused
-              </a>`
-          },
+          }
         ],
       },
       footer: {

diff --git a/docs/my-website/img/langfuse_prompt_management_model_config.png b/docs/my-website/img/langfuse_prompt_management_model_config.png
diff --git a/docs/my-website/img/prompt_management_architecture_doc.png b/docs/my-website/img/prompt_management_architecture_doc.png
diff --git a/docs/my-website/sidebars.js b/docs/my-website/sidebars.js
@@ -135,7 +135,6 @@ const sidebars = {
             "oidc"
           ]
         },
-        "proxy/prompt_management",
         "proxy/caching",
         "proxy/call_hooks",
         "proxy/rules", 
@@ -228,6 +227,7 @@ const sidebars = {
         "completion/batching",
         "completion/mock_requests",
         "completion/reliable_completions",
+        'tutorials/litellm_proxy_aporia',
 
       ]
     },
@@ -309,8 +309,29 @@ const sidebars = {
           label: "LangChain, LlamaIndex, Instructor Integration",
           items: ["langchain/langchain", "tutorials/instructor"],
         },
+        {
+          type: "category",
+          label: "Tutorials",
+          items: [
+
+            'tutorials/azure_openai',
+            'tutorials/instructor',
+            "tutorials/gradio_integration",
+            "tutorials/huggingface_codellama",
+            "tutorials/huggingface_tutorial",
+            "tutorials/TogetherAI_liteLLM",
+            "tutorials/finetuned_chat_gpt",
+            "tutorials/text_completion",
+            "tutorials/first_playground",
+            "tutorials/model_fallbacks",
+          ],
+        },
       ],
     },
+    {
+      type: "doc",
+      id: "proxy/prompt_management"
+    },
     {
       type: "category",
       label: "Load Testing",
@@ -362,23 +383,7 @@ const sidebars = {
         "observability/opik_integration",
       ],
     },
-    {
-      type: "category",
-      label: "Tutorials",
-      items: [
-        'tutorials/litellm_proxy_aporia',
-        'tutorials/azure_openai',
-        'tutorials/instructor',
-        "tutorials/gradio_integration",
-        "tutorials/huggingface_codellama",
-        "tutorials/huggingface_tutorial",
-        "tutorials/TogetherAI_liteLLM",
-        "tutorials/finetuned_chat_gpt",
-        "tutorials/text_completion",
-        "tutorials/first_playground",
-        "tutorials/model_fallbacks",
-      ],
-    },
+
     {
       type: "category",
       label: "Extras",

diff --git a/litellm/integrations/custom_logger.py b/litellm/integrations/custom_logger.py
@@ -14,6 +14,7 @@
     EmbeddingResponse,
     ImageResponse,
     ModelResponse,
+    StandardCallbackDynamicParams,
     StandardLoggingPayload,
 )
 
@@ -60,6 +61,26 @@ async def async_log_success_event(self, kwargs, response_obj, start_time, end_ti
     async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
         pass
 
+    #### PROMPT MANAGEMENT HOOKS ####
+
+    def get_chat_completion_prompt(
+        self,
+        model: str,
+        messages: List[AllMessageValues],
+        non_default_params: dict,
+        headers: dict,
+        prompt_id: str,
+        prompt_variables: Optional[dict],
+        dynamic_callback_params: StandardCallbackDynamicParams,
+    ) -> Tuple[str, List[AllMessageValues], dict]:
+        """
+        Returns:
+        - model: str - the model to use (can be pulled from prompt management tool)
+        - messages: List[AllMessageValues] - the messages to use (can be pulled from prompt management tool)
+        - non_default_params: dict - update with any optional params (e.g. temperature, max_tokens, etc.) to use (can be pulled from prompt management tool)
+        """
+        return model, messages, non_default_params
+
     #### PRE-CALL CHECKS - router/proxy only ####
     """
     Allows usage-based-routing-v2 to run pre-call rpm checks within the picked deployment's semaphore (concurrency-safe tpm/rpm checks).