Prompt Poet streamlines and simplifies prompt design for both developers and non-technical users with its low code approach. Using a mix of YAML and Jinja2, Prompt Poet allows for flexible, dynamic prompt creation, enhancing the efficiency and quality of interactions with AI models. It saves time on engineering string manipulations, enabling everyone to focus more on crafting the optimal prompts for their users.
pip install prompt-poet
import os
import getpass
from prompt_poet import Prompt
from langchain import ChatOpenAI
# Uncomment if you need to set OPENAI_API_KEY.
# os.environ["OPENAI_API_KEY"] = getpass.getpass()
raw_template = """
- name: system instructions
role: system
content: |
Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.
- name: user query
role: user
content: |
{{ username}}: {{ user_query }}
- name: response
role: user
content: |
{{ character_name }}:
"""
template_data = {
"character_name": "Character Assistant",
"username": "Jeff",
"user_query": "Can you help me with my homework?"
}
prompt = Prompt(
raw_template=raw_template,
template_data=template_data
)
model = ChatOpenAI(model="gpt-4o-mini")
response = model.invoke(prompt.messages)
Prompt Poet templates use a mix of YAML and Jinja2. Template processing occurs in two primary stages:
- Rendering: Initially, Jinja2 processes the input data. During this phase, control flow logic is executed, data is validated and appropriately bound to variables, and functions within the template are appropriately evaluated.
- Loading: Post-rendering, the output is a structured YAML file. This YAML structure consists of repeated blocks or parts, each encapsulated into a Python data structure. These parts are characterized by several attributes:
- Name: A clear, human-readable identifier for the part.
- Content: The actual string payload that forms part of the prompt.
- Role (Optional): Specifies the role of the participant, aiding in distinguishing between different users or system components.
- Truncation Priority (Optional): Determines the order of truncation when necessary, with parts having the same priority being truncated in the order in which they appear.
- name: system instructions
role: system
content: |
Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.
- name: user query
role: user
content: |
{{ username}}: {{ user_query }}
- name: reply_prompt
role: user
content: |
{{ character_name }}:
If you have elements (e.g. messages) in a list you can parse them into your template like so.
{% for message in current_chat_messages %}
- name: chat_message
role: user
content: |
{{ message.author }}: {{ message.content }}
{% endfor %}
Context length is limited and can’t always fit the entire chat history– so we can set a truncation priority on the message parts and Prompt Poet will truncate these parts in the order in which they appear (oldest to newest).
{% for message in current_chat_messages %}
- name: chat_message
role: user
truncation_priority: 1
content: |
{{ message.author }}: {{ message.content }}
{% endfor %}
To tailor instructions based on the user's current modality (audio or text).
{% if modality == "audio" %}
- name: special audio instruction
role: system
content: |
{{ username }} is currently using audio. Keep your answers succinct.
{% endif %}
To include context-specific examples like homework help when needed.
{% if extract_user_query_topic(user_query) == "homework_help" %}
{% for homework_example in fetch_few_shot_homework_examples(username, character_name) %}
- name: homework_example_{{ loop.index }}
role: user
content: |
{{ homework_example }}
{% endfor %}
{% endif %}
Prompt Poet will strip whitespace by default to avoid unwanted newlines in your final prompt. If you want to include an explicit space use the special built-in space marker “<|space|>” to ensure proper formatting.
- name: system instructions
role: system
content: |
Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.
- name: user query
role: user
content: |
<|space|>{{ username}}: {{ user_query }}
Compositionality is a core strength of Prompt Poet templates, enabling the creation of complex, dynamic prompts.
- name: system instructions
role: system
content: |
Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.
{% if modality == "audio" %}
- name: special audio instruction
role: system
content: |
{{ username }} is currently using audio modality. Keep your answers succinct and to the point.
{% endif %}
{% if extract_user_query_topic(user_query) == "homework_help" %}
{% for homework_example in fetch_few_shot_homework_examples(username, character_name) %}
- name: homework_example_{{ loop.index }}
role: user
content: |
{{ homework_example }}
{% endfor %}
{% endif %}
{% for message in current_chat_messages %}
- name: chat_message
role: user
truncation_priority: 1
content: |
{{ message.author }}: {{ message.content }}
{% endfor %}
- name: user query
role: user
content: |
{{ username}}: {{ user_query }}
- name: reply_prompt
role: user
content: |
{{ character_name }}:
To maintain DRY principles in your templates, break them down into reusable sections that can be applied across different templates, such as when A/B testing a new prompt.
{% include 'sections/system_instruction.yml.j2' %}
{% include 'sections/audio_instruction.yml.j2' %}
{% if extract_user_query_topic(user_query) == "homework_help" %}
{% include 'sections/homework_examples.yml.j2' %}
{% endif %}
{% include 'sections/chat_messages.yml.j2' %}
{% include 'sections/user_query.yml.j2' %}
{% include 'sections/reply_prompt.yml.j2' %}
The Prompt Poet Library provides various features and settings, including prompt properties. Key features like tokenization and truncation help with efficient caching and low latency responses
prompt.tokenize()
prompt.truncate(token_limit=TOKEN_LIMIT, truncation_step=TRUNCATION_STEP)
# Inspect prompt as a raw string.
prompt.string: str
>>> "..."
# Inpsect the prompt as raw tokens.
prompt.tokens: list[int]
>>> [...]
# Inspect the prompt as LLM API message dicts.
prompt.messages: list[dict]
>>> [...]
# Inspect the prompt as first class parts.
prompt.parts: list[PromptPart]
>>> [...]
Jinja2 and YAML combine to offer an incredibly extensible and expressive templating language. Jinja2 facilitates direct data bindings, arbitrary function calls, and basic control flow within templates. YAML provides structure to our templates (with depth=1) allowing us to perform sophisticated truncation when the token limit is reached. This pairing of Jinja2 and YAML is not unique – most notably it is used by Ansible.
One standout feature of Jinja2 is the ability to invoke arbitrary Python functions directly within templates at runtime. This feature is crucial for on-the-fly data retrieval, manipulation, and validation, streamlining how prompts are constructed. Here extract_user_query_topic
can perform arbitrary processing of the user's query used in the template's control flow--perhaps by performing a round-trip to a topic classifier.
{% if extract_user_query_topic(user_query) == "homework_help" %}
{% for homework_example in fetch_few_shot_homework_examples(username, character_name) %}
- name: homework_example_{{ loop.index }}
role: user
content: |
{{ homework_example }}
{% endfor %}
{% endif %}
By default Prompt Poet will use the TikToken “o200k_base” tokenizer although alternate encoding names may be provided in the top-level tiktoken_encoding_name
. Alternatively, users can provide their own encode function with the top-level encode_func: Callable[[str], list[int]]
.
from tiktoken import get_encoding
encode_func = get_encoding("o200k_base")
prompt = Prompt(
raw_template=raw_template,
template_data=template_data,
encode_func=encode_func
)
prompt.tokenize()
prompt.tokens
>>> [...]
If your LLM provider supports GPU affinity and prefix cache, utilize Character.AI’s truncation algorithm to maximize the prefix-cache rate. The prefix cache rate is defined as the number of prompt tokens retrieved from cache over the total number of prompt tokens. Find the optimal values for truncation step and token limit for your use case. As the truncation step increases, the prefix cache rate also rises, but more tokens are truncated from the prompt.
TOKEN_LIMIT = 128000
TRUNCATION_STEP = 4000
# Tokenize and truncate the prompt.
prompt.tokenize()
prompt.truncate(token_limit=TOKEN_LIMIT, truncation_step=TRUNCATION_STEP)
response = model.invoke(prompt.messages)
In short, Cache Aware Truncation truncates up to a fixed truncation point every time it is invoked–only moving this truncation point on average every k turns. This allows your LLM provider to maximally exploit GPU prefix cache described in Optimizing Inference. If instead we simply truncated until reaching the token limit (L) this truncation point would move every turn which would cause a significant reduction in prefix cache rate. The tradeoff in this approach is that we often truncate more than we strictly need to.
A Template Registry is simply the concept of storing templates as files on disk. In using a Template Registry you can isolate template files from your python code and load these files directly from disk. In production systems, these template files can optionally be loaded from an in-memory cache on successive uses, saving on disk I/O. In the future a Template Registry may become a first-class citizen of Prompt Poet.
Filename: chat_template.yml.j2
- name: system instructions
role: system
content: |
Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.
- name: user query
role: user
content: |
{{ username}}: {{ user_query }}
- name: response
role: user
content: |
{{ character_name }}:
Run this python code from the same directory you have saved the file chat_template.yml.j2
to.
from prompt_poet import Prompt
prompt = Prompt(
template_path="chat_template.yml.j2",
template_data=template_data
)
print(prompt.string)
>>> 'Your name is Character Assistant and you are meant to be helpful and never harmful to humans.Jeff: Can you help me with my homework?Character Assistant:'
- Priompt: Priompt (priority + prompt) is a JSX-based prompting library. It uses priorities to decide what to include in the context window. This project achieves a similar goal in separating a templating layer from a logical construction layer written in and compatible with TypeScript-based usage.
- dspy: Provides a great way of automagically optimizing prompts for different models though lacks deterministic control of the prompt important for things like caching and high-throughput, low latency production systems.
- Prompt Engine: Born from a common problem of production prompt engineering requiring substantial code to manipulate and update strings this Typescript package similarly adds structure to the prompt templating process– though comes across as being somewhat opinionated making assumptions based on the use cases. With last commits being from 2 years ago it does not seem as though this package is in active development.
- llm: Allows basic prompts to be defined in YAML with the Jinja2 enabled features like dynamic control flow, function calling and data bindings.
- Raw Python f-strings: There are several projects that take slightly different approaches to prompt templating by wrapping f-strings:
- LangChain: LangChain has a much larger scope than prompt templates though it does provide some basic templating abstractions. Good for simple templating use cases then starts to get unwieldy as prompts increase in complexity.
- LlamaIndex: Like LangChain, LlamaIndex has a much larger scope than prompt templates though it also provides some basic templating abstractions.
- Mirascope: Implements a novel approach to prompt templating by encapsulating everything in a single python class and using the class’s docstring as the f-string into which to bind data.