Prompt engineering involves designing prompts or inputs that elicit desired response from the model. You can create prompts for a wide range of purposes, including information retrieval, problem solving, creative/technical writing, coding assistance, language translation, opinion generation, text summerization, conversational agents, data analysis, and more.
Here are some simple prompt examples:
How to convert inches to centimeters?
Summarize this story: once upon a time, there was a ...
Write java code for this problem with optimal runtime complexity. Give an array of integers and an integer target, return the indices of the two numbers such that they add up to the target. Also explain the runtime complexity of the code.
Write a short story about a talking bear who lived in a treehouse.
Translate "hello world, I am a llama" to French.
Suggest innovative solutions to address climate change.
Explain the principles of quantum computing in simple terms.
What would happen if WII had a different outcome?
Envision the world 100 years from now.
However, crafting prompts is not always simple. You must follow the prompt template syntax, which is different for each model. In fact, it is recommended to include all five components below wrapped in a prompt template. Otherwise, there could be a substantial reduction in response quality.
- Task description
- Task parameters
- Training data
- Retrieval augmented data
- Actual query
In this tutorial, we will first delve into the details of prompt templates and prompt components. Then, we will teach you how to iterate and refine prompts to achieve optimal results and harness the full potential of a language model. Finally, we will show you how to use Lamini to execute prompts.
A task description sets the context and establishes a role or scenario for the model, guiding its responses.
For example,
You are an attorney who graduate top of class in from Harvard Law with 20 years of experience in patent law.
The task description should be hardcoded for similar types of tasks or requests.
The task parameters should include the key metadata of the subject you are querying. This is especially important if the subject is not well known or too recent, as LLMs have typically only been trained on < 1% of the public data on the internet, and may not have the recent data.
If you intend to query the model about United States v. Elizabeth A. Holmes, you may incorporate the parameters below. You are expected to generate the metadata with a program, potentially connecting to an internal database of court cases and extracting the fields with the most views.
Below are the key aspects of the court case.
Title: United States v. Elizabeth Holmes, et al.
Year: 2023
Plaintiff: United States
Main defedant: Elizabeth Holmes
Legal Issues: The case involves charges of wire fraud and conspiracy to commit wire fraud, with accusations that Holmes engaged in a years-long fraud by making false statements about Theranos' technology, business, and financial performance. The legal proceedings are related to the accuracy and legitimacy of Theranos' blood-testing technology and the alleged misrepresentation of the company's capabilities.
Outcome: Holmes was found guilty on four counts of defrauding investors – three counts of wire fraud, and one of conspiracy to commit wire fraud. She was found not guilty on four counts of defrauding patients – three counts of wire fraud and one of conspiracy to commit wire fraud.
Specifying task parameters in a prompt helps guide the model's understanding and context, ensuring it produces relevant and accurate responses tailored to the specific requirements of the task or query.
Few-shot learning is a machine learning approach where a model is trained to perform tasks with minimal examples, typically requiring only a small number of instances for each class or category. This allows the model to generalize and make accurate predictions even with limited labeled data. If you are interested in learning more about few-shot learning, here is a great 5 minute video tututorial.
The training data should be high quality question-answer pairs. For now, let's assume the pairs are manually generated by a human. We will discuss how to use a model to generate the pairs in the [Data pipeline example](https://github.com/lamini-ai/sdk/tree/main/05_data_pipeline.
Suppose we plan to use the actual query below:
Conduct a succinct analysis of the legal case United States v. Elizabeth Holmes, focusing on wire fraud charges. Examine the latest developments.
The training data should include similar queries on related court cases. Since the actual query focuses on wire fraud charges, the question-answer pairs should also focus on wired fraud charges, like below:
Here are some related court case:
Question: Conduct a succinct analysis of the legal case United States v. Bernard L. Madoff, focusing on wire fraud charges. Examine the latest developments.
Answer: Bernie Madoff, a former NASDAQ chairman, orchestrated one of the largest Ponzi schemes in history. Arrested in 2008, Madoff pleaded guilty in 2009 to 11 federal felonies, including securities and wire fraud. He received a 150-year prison sentence and was ordered to forfeit $170.8 billion. The case exposed regulatory failures, led to financial reforms, and stands as a cautionary tale of massive financial fraud. In January 2022, Bernard Madoff passed away on April 14, 2021, while serving a 150-year prison sentence.
Question: Conduct a succinct analysis of the legal case United States v. Stanford, focusing on wire fraud charges. Examine the latest developments.
Answer: R. Allen Stanford faced charges of securities fraud, wire fraud, and money laundering for orchestrating a massive Ponzi scheme through Stanford Financial Group. Stanford attracted investors with fictitious high-return certificates of deposit from an offshore bank. The scheme collapsed in 2009, leading to civil and criminal charges. In 2012, Stanford was found guilty on multiple counts, receiving a 110-year prison sentence. The case emphasized the risks associated with offshore banking and the importance of stringent financial oversight.
As you incorporate more training data, the output quality improves, but only up to a specific threshold; beyond that point, the quality diminishes. The limitation on the amount of training data in a prompt is largely due to the practical constraint on the input size that the model can handle.
- Memory limitation - A prompt is first converted to a tensor, or a multi-dimensional array of numerical values that can be efficiently processed by machine learning models. This takes up a signficant amount of memory because the tensor representation includes numerical values that capture the semantic and contextual information of the prompt.
- Position encoding limitation - Models use positional encodings to understand the order of tokens. Very long prompts may cause positional information to go out of distribution, where the model encounters data that is significiantly different from what it has been trained on, potentially leading to inaccurate or unreliable predictions.
Different models have different prompt length restrictions. To identify the limit for a specific model, visit 🤗Hugging Face and check the max_position_embeddings
parameter in the model's config.json
file.
Note max_position_embeddings
is specified as the number of tokens, not number of characters.
A token is a unit of text, which can be individual words, subwords, or characters.
If you are interested in learning more about tokens, please visit the
Hugging Face Tokenizers tutorial.
For example, this config shows max_position_embeddings = 32768
(tokens) for the Mistral-7B-Instruct-v0.2 model.
Retrieval augmented data refers to additional data obtained outside the model using special information retrieval on an internal knowledge base. In the upcoming RAG tutorial, we will delve into retrieval augmented generation. For now, let's assume that we already possess some retrieval augmented data.
In March 2023, Elizabeth Holmes requested bail post-conviction, but the judge denied it, stating it wasn't an attempt to flee. In April, she appealed her conviction, seeking a delay in her prison term, which was rejected. Prosecutors sought $878 million in restitution. Her appeal against the denial was rejected in May 2023.
Conduct a succinct analysis of the legal case United States v. Elizabeth Holmes, focusing on wire fraud charges. Examine the latest developments.
Now that we have all five components, our full prompt is:
You are an attorney who graduate top of class in from Harvard Law with 20 years of experience in patent law.
Below are the key aspects of the court case.
Title: United States v. Elizabeth Holmes, et al.
Year: 2023
Plaintiff: United States
Main defedant: Elizabeth Holmes
Legal Issues: The case involves charges of wire fraud and conspiracy to commit wire fraud, with accusations that Holmes engaged in a years-long fraud by making false statements about Theranos' technology, business, and financial performance. The legal proceedings are related to the accuracy and legitimacy of Theranos' blood-testing technology and the alleged misrepresentation of the company's capabilities.
Outcome: Holmes was found guilty on four counts of defrauding investors – three counts of wire fraud, and one of conspiracy to commit wire fraud. She was found not guilty on four counts of defrauding patients – three counts of wire fraud and one of conspiracy to commit wire fraud.
Here are some related court case:
Question: Conduct a succinct analysis of the legal case United States v. Bernard L. Madoff, focusing on wire fraud charges. Examine the latest developments.
Answer: Bernie Madoff, a former NASDAQ chairman, orchestrated one of the largest Ponzi schemes in history. Arrested in 2008, Madoff pleaded guilty in 2009 to 11 federal felonies, including securities and wire fraud. He received a 150-year prison sentence and was ordered to forfeit $170.8 billion. The case exposed regulatory failures, led to financial reforms, and stands as a cautionary tale of massive financial fraud. In January 2022, Bernard Madoff passed away on April 14, 2021, while serving a 150-year prison sentence.
Question: Conduct a succinct analysis of the legal case United States v. Stanford, focusing on wire fraud charges. Examine the latest developments.
Answer: R. Allen Stanford faced charges of securities fraud, wire fraud, and money laundering for orchestrating a massive Ponzi scheme through Stanford Financial Group. Stanford attracted investors with fictitious high-return certificates of deposit from an offshore bank. The scheme collapsed in 2009, leading to civil and criminal charges. In 2012, Stanford was found guilty on multiple counts, receiving a 110-year prison sentence. The case emphasized the risks associated with offshore banking and the importance of stringent financial oversight.
In March 2023, Elizabeth Holmes requested bail post-conviction, but the judge denied it, stating it wasn't an attempt to flee. In April, she appealed her conviction, seeking a delay in her prison term, which was rejected. Prosecutors sought $878 million in restitution. Her appeal against the denial was rejected in May 2023.
Conduct a succinct analysis of the legal case United States v. Elizabeth Holmes, focusing on wire fraud charges. Examine the latest developments.
You may separate the sections in the prompt with newlines. Models are very sensitive to content but not to delimiters.
Coming soon: results and examples showing what happens when parts of prompt are missing.
A prompt template is like a standardized format for asking questions or providing input to the model. It helps the model understand the context and type of information you're seeking, improving the chances of getting relevant and accurate responses.
WARNING. Models are trained to assume specifc prompt template syntax. If you forget the prompt template, it will make the model's answers much lower quality.
For example, Llama 3 usees the following template,
where we replace {system_prompt}
with the system prompt and
{user_prompt}
with the user prompt:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{user}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
A query such as "Conduct a succinct analysis of the legal case United States v. Elizabeth Holmes, focusing on wire fraud charges."
represents a user prompt,
seeking specific information and tailored responses.
On the other hand, a system prompt, an optional
directive, sets the context and guides the language model's overall behavior and tone.
Here are some examples of system prompts.
You are an attorney with extensive experience in various areas of law.
You are a panelist on a legal ethics symposium. Aim to provide a comprehensive analysis suitable for an audience of legal professionals and ethicists.
Imagine you are a poet inspired by nature.
You are a patient and helpful customer service agent. Help the user troubleshoot.
For example, the Mistral 7B model does not distinguish between user and system prompts in the template. So you can replace {prompt}
with the concatenation of the system prompt and the user prompt.
[INST] {prompt} [/INST]
To find the prompt template for each model, you can refer to the model's documentation.
It is crucial to insert the five prompt components we previously discussed into the
appropriate sections of the prompt template.
For example, for Llama 3, we would replace {system_prompt}
with the task description
and task parameters, and replace {user_prompt}
with the rest of the components.
For another example, for Mistral, all five prompt components would be placed in {prompt}
.
Coming soon: examples showing what happens if you don't use a template
In prompt engineering, you can manipulate different attributes of the responses. Here are some key attributes you can control:
It is not uncommon for a model to produce sentences that are too long or incomplete, influenced by various factors such as prompt complexity and training data. Similarly, the model may produce responses shorter than your desired length.
Here are a few techniques to get the desired output lengths.
- Specify desired length:
- Explicitly state phrases like
"Generate a response with up to two paragraphs."
or"End reponse after 50 words."
- Explicitly state phrases like
- Use keywords and constraints:
- Include keywords or constraints like
"brief"
,"succinct"
or"Write a detailed paragraph."
.
- Include keywords or constraints like
Consider the output to this prompt "Describe the impacts of Silicon Valley."
It is long and has an incomplete sentence at the end.
Silicon Valley is a region in California that is known for its high-tech industry and innovation. It is home to some of the world's largest and most successful technology companies, including Google, Facebook, Apple, and Microsoft. The region has had a significant impact on the global economy and has transformed the way we live, work, and communicate.
One of the most significant impacts of Silicon Valley is the creation of new jobs and industries. The region has become a hub for innovation and entrepreneurship, attracting talented individuals from around the world who are looking to work on cutting-edge technologies. This has led to the creation of new industries such as artificial intelligence, virtual reality, and blockchain, which have the potential to revolutionize many aspects of our lives.
Silicon Valley has also had a significant impact on the global economy. The region is home to some of the world's largest and most successful technology companies, which have created billions of dollars in revenue and have transformed the way we do business. This has led to increased competition and innovation in many industries, which has driven down costs and increased efficiency.
However, Silicon Valley has also been criticized for its impact on society. Some have argued that the region's
The prompt below produces a response that is brief and complete.
Describe the impacts of Silicon Valley in a few sentences.
Output:
Silicon Valley has had a significant impact on the world, driving innovation and technological advancements that have transformed industries and improved our lives in countless ways. The region is home to some of the world's largest and most influential tech companies, including Google, Facebook, and Apple, which have created millions of jobs and contributed billions of dollars to the global economy. However, Silicon Valley has also faced criticism for its impact on society, including concerns about privacy, social isolation, and the widening wealth gap. Overall, the region represents both incredible potential and challenges for the future.
We can get an even shorter response with
Describe the impacts of Silicon Valley in a few words.
Output:
Silicon Valley has revolutionized technology, created jobs, and transformed industries.
You can guide the model to generate responses in the desired format, whether it be a list, a table, or a customized format. Here are some key strategies:
- Instructional Clarity - use explicit language to instruct the model on the desired format.
- Ex. Use phrases like
"Generate the response as a list."
or"Present the information in table format."
- Ex. Use phrases like
- Example Illustration - include a clear example of the desired output format within your prompt. Show a sample list or table and instruct the model to follow that structure, helping it understand your expectations.
- Ex.
Describe the advantages of renewable energy. Provide the response in a bulleted list format. For instance: Environmental Sustainability: Decreases carbon footprint and minimizes environmental impact. Cost-Efficiency: Long-term savings through reduced reliance on fossil fuels. Energy Independence: Reduces dependence on non-renewable resources. Follow a similar structure in your response.
- Ex.
The model will generally make an effort to adhere to the specific format, especially if the instructions are clear. However, there is no guarantee of adherence to the format due to model limitations and other factors.
We will show you you how to use Lamini to guarantee valid JSON output in a later section.
You can control your desired level of creativity or precision using alternative phrasing. Here are some examples:
- Higher Creativity
"Generate a narrative with a more exploratory tone."
"Compose a story with a touch of unpredictability."
"Provide a response that allows for a broader range of possibilities."
- Lower Creativity
"Deliver a straightforward and concise explanation."
"Offer a focused and to-the-point response."
"Provide information with a higher level of certainty."
- Experimentation
"Explore various perspectives in your response."
"Try different approaches in your explanation."
"Adjust your writing style to see how it affects the output."
Consider this response
The decision in Nixon v. United States was that President Richard Nixon was ordered to release tapes of his conversations with his advisors, as part of a court order to comply with the Presidential Records Act.
We can ask the model to rephrase this response in the prompt.
Rewrite this: The decision in Nixon v. United States was that President Richard Nixon was ordered to release tapes of his conversations with his advisors, as part of a court order to comply with the Presidential Records Act.
Output:
The Nixon v. United States case resulted in a court order requiring President Richard Nixon to disclose recordings of his discussions with advisors, in accordance with the Presidential Records Act.
Models tend to be more sensitive to prompt content than the ordering of instructions.
However, placing instructions or constraints like "Write in one sentence."
at the end
of the prompt is often recommended.
Giving the model the chance to understand the context before encountering constraints can generally lead to more contextually appropriate and coherent responses.
On the other hand, adding constraints to the beginning of the prompt might lead to
responses that are incomplete, inflexible (giving the same answer even for different questions), incoherent or less accurate.
The outcome of where you position constraints may also depend on the model and your specific use case. You may get comparable responses irrespective of where the constraints are positioned.
Hallucinations in language models refer to the generation of incorrect or fictional information that is not present in the training data. While achieving complete prevention is inherently difficult due to the dynamic and complex nature of natural language, there are strategies for effectively reducing hallucinations through prompt engineering.
Clear and direct language in prompt engineering enhances model understanding, reduces ambiguity, and improves overall communication. It is advisable to avoid jargon, slang, ambiguous language, and overly complex terms.
For example, the first prompt below is ambigous, while the second prompt clearly instructs the model to perform a specific task.
❌ "Too much diet cola."
✅ "Explore the consequences of excessive diet cola consumption."
Now consider the prompts below. The first prompt is overly complex, while the second one is concise.
❌ "Analyze the nuanced interactions among diverse technological elements and their implications on the optimization of organizational processes within the ever-changing landscape of industries, elucidating both positive outcomes and potential challenges."
✅ "Examine how the integration of diverse technologies impacts organizational processes, highlighting both positive outcomes and potential challenges in evolving industries."
Adding context provides a specific scenario, guiding the language model to produce a more detailed and tailored response. For example,
❌ "Describe San Francisco."
✅ "You are a travel blogger exploring San Francisco. Write a description highlighting its unique architecture, local cuisine, and lively atmosphere."
You can explicitly include a statement like below in the prompt. This can guide the model to prioritize accuracy and honesty in its responses.
"Provide accurate and truthful information."
"Generate a response based on factual and honest content."
"Please avoid generating false or misleading information."
In addition, you can instruct the model to not answer the question if it cannot generate a good response, such as
"If you cannot generate a confident or accurate response, please reply with \"I don't know\"."
Using correct spacing, grammar and syntax is crucial in prompts to ensure the model understands the request accurately and generates coherent responses.
Bad spacing
❌ " Generate a paragraph about the importance of recycling. "
✅ Generate a paragraph about the importance of recycling."
Bad grammar and syntax
❌ "write me a poem about sunsets beautiful red and orange."
✅ "compose a poem describing the beauty of red and orange sunsets."
The process of refining and improving prompts through successive iterations is crucial for achieving desired results. The key is to be adaptable and responsive to the performance of the model and user needs. Here is an example:
- Observation
- Initial Prompt:
"Tell me about climate change."
- Observation: The model provides general information, but the response lacks specificity.
- Initial Prompt:
- Iteration 1 - Refine User Prompt:
- Refined Prompt:
"Explain the impact of human activities on rising global temperatures due to climate change."
- Observation: The model provides more detailed and focused information.
- Refined Prompt:
- Iteration 2 - Test Lower Creativity:
- Refined Prompt:
"In a clear and precise manner, elucidate the widely acknowledged effects of human activities on the upward trend in global temperatures as a consequence of climate change, highlighting established factors and scientific consensus."
- Observation: The response become more focused and controlled.
- Refined Prompt:
- Iteration 3 - Test Higher Creativity:
- Refined Prompt:
"Illuminate the story of human impact, where our actions act as the architects of a warming world, crafting a narrative that unfolds in the evolving drama of climate change."
- Observation: The response becomes more creative and random.
- Refined Prompt:
- Iteration 4 - Test Diverse Scenarios:
- Refined Prompt:
"Discuss the role of renewable energy in mitigating climate change."
- Testing Scenario: Include prompts related to different renewable energy sources.
- Observation: Evaluate how well the model generalizes across various aspects of the topic.
- Refined Prompt:
- Iteration 5 - Incorporate Feedback.
- Refined Prompt:
"Examine the economic impact of climate change policies on developing nations."
- Feedback: Users express a desire for insights into economic aspects.
- Observation: The model adapts to provide more information on economic considerations.
- Refined Prompt:
- Iteration 6 - Adjust Response Length:
- Refined Prompt:
"Summarize the key findings of the latest IPCC report on climate change in a few sentences."
- Refined Prompt:
"Summarize the key findings of the latest IPCC report on climate change in three paragraphs."
- Observation: Need to balance length and detail based on specific use case and preferences.
- Refined Prompt:
- Iteration 7 - Explore Novel Prompts:
- Refined Prompt:
"Imagine you are a journalist reporting on climate change. Provide a news brief highlighting recent developments."
- Observation: Assess how well the model responds to prompts that simulate real-world scenarios.
- Refined Prompt:
- Iteration 8 - Fine-Tune System Prompts:
- Refined System Prompt:
"You are an expert scientist responding to inquiries about climate change. Maintain a scientific and informative tone."
- Observation: Evaluate if the refined system prompt influences the model's tone and style.
- Refined System Prompt:
- Iteration 9 - Stay Informed:
- Refined Prompt:
"Considering recent advancements, discuss the emerging technologies for carbon capture and their potential impact on mitigating climate change."
- Adjustment: Incorporate new keywords or concepts based on the latest information.
- Observation: Ensure that the model stays up-to-date with evolving topics.
- Refined Prompt:
WARNING. A common mistake we see users make is to not iterate fast enough. Instead of spending 10 minutes or more writing a perfect prompt, spend 1-2 minutes writing a rough draft and then run it. Iterate on the prompt based on what the model does. Try to iterate as quickly as possible.
We've created model runner, llm, to simplify the process of executing prompts.
Our Lamini
uses the latest available instruct model by default and allows
you to obtain the response with just a few lines of code, like below.
from lamini import Lamini
llm = Lamini()
prompt = "<s>[INSTR]"
prompt += "You are a panelist on a legal ethics symposium. Aim to provide a comprehensive analysis suitable for an audience of legal professionals and ethicists.\n"
prompt += "What was the decision in Nixon v. United States?\n"
prompt += "[/INSTR]"
answer = llm.generate(prompt)
print(answer)
Behind the scenes, Lamini
automatically wraps the user and system prompts
in model prompt template. The default system prompt is model's recommended system prompt.
Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.
Llama 3 example:
from lamini import Lamini
llm = Lamini(model_name="meta-llama/Meta-Llama-3.1-8B-Instruct")
prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n"
prompt += "You are a panelist on a legal ethics symposium. Aim to provide a comprehensive analysis suitable for an audience of legal professionals and ethicists.\n"
prompt = "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
prompt += "What was the decision in Nixon v. United States?\n"
prompt += "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
answer = llm.generate(prompt)
print(answer)
Batching involves grouping multiple input prompts together and processing them simultaneously as a batch. This approach enhances efficiency by allowing the model to handle several prompts at once to optimizing resource utilization.
In Lamini, the first argument to the llm can either be a single prompt string or a list of prompt strings. When a single prompt is used, our system will run in non-batch mode and return a string output. On the other hand, when a prompt list used, the system will run in batch mode and return a list of dictionaries as output, like below.
from lamini import Lamini
llm = Lamini()
prompt = "<s>[INSTR]"
prompt += "Provide very short responses.\n"
prompt += "Is pizza nutritous?\n"
prompt += "Did Richard Nixon resign?\n"
prompt += "Summarize the impact of global warming.\n"
prompt += "[/INSTR]"
answer = llm.generate(prompt)
print(answer)
Output:
[{'input': 'Is pizza nutritous?',
'output': 'No, pizza is not typically considered a nutritious food due to its high calorie, carbohydrate, and fat content. However, it can be made healthier by using whole grain crust, lean protein toppings, and plenty of vegetables.'},
{'input': 'Did Richard Nixon reisgn?',
'output': ' Yes, Richard Nixon resigned as President of the United States on August 9, 1974.'},
{'input: 'Summarize the impact of global warming.',
'output': " Global warming has significant impacts on the Earth's environment, including rising sea levels, more frequent and intense heatwaves, droughts, and extreme weather events. It also affects wildlife, agriculture, and human health. The main cause of global warming is the increase in greenhouse gases in the atmosphere, primarily from human activities such as burning fossil fuels and deforestation. Addressing global warming requires reducing greenhouse gas emissions and transitioning to renewable energy sources."}]
While you can ask a model to output a json in the prompt, the results may lack consistency and reliability. Hence, we've introduced a feature to guarantee valid JSON output through our web API. See our docs!
If you are interested in the technical details, see our blog post Guarantee Valid JSON Output with Lamini.