diff --git a/main/search/search_index.json b/main/search/search_index.json
index 31dfe254f..529538523 100644
--- a/main/search/search_index.json
+++ b/main/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"installation/","title":"Installation","text":"You can install Outlines with pip
:
pip install outlines\n
Outlines supports OpenAI, transformers, Mamba, llama.cpp and exllama2 but you will need to install them manually:
pip install openai\npip install transformers datasets accelerate torch\npip install llama-cpp-python\npip install exllamav2 transformers torch\npip install mamba_ssm transformers torch\npip install vllm\n
If you encounter any problem using Outlines with these libraries, take a look at their installation instructions. The installation of openai
and transformers
should be straightforward, but other libraries have specific hardware requirements.
"},{"location":"installation/#bleeding-edge","title":"Bleeding edge","text":"You can install the latest version of Outlines on the repository's main
branch:
pip install git+https://github.com/dottxt-ai/outlines.git@main\n
This can be useful, for instance, when a fix has been merged but not yet released.
"},{"location":"installation/#installing-for-development","title":"Installing for development","text":"See the contributing documentation for instructions on how to install Outlines for development.
"},{"location":"licence/","title":"Licence and citations","text":"Outlines is licenced under the Apache 2.0 licence. To comply with the licence you need to add the following notice at the top every file that uses part of Outlines' code:
Copyright 2023- The Outlines developers\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n
If you use Outlines in your work you can use the following citation:
@article{willard2023efficient,\n title={Efficient Guided Generation for LLMs},\n author={Willard, Brandon T and Louf, R{\\'e}mi},\n journal={arXiv preprint arXiv:2307.09702},\n year={2023}\n}\n
"},{"location":"quickstart/","title":"Quickstart","text":"After installing Outlines, the fastest way to get to up to speed with the library is to get acquainted with its few core elements. We advise you to take a quick look at this page to see everything Outlines has to offer before diving in the documentation.
"},{"location":"quickstart/#core-elements","title":"Core elements","text":""},{"location":"quickstart/#models","title":"Models","text":"The first step when writing a program with Outlines is to initialize a model. Weights will be loaded on the device at this step:
import outlines\n\nmodel = outlines.models.transformers(\n \"microsoft/Phi-3-mini-4k-instruct\",\n device=\"cuda\" # optional device argument, default is cpu\n)\n
Outlines supports a wide variety of inference engines and model weight types. More details on different models can be found in the Outlines Models documentation page.
"},{"location":"quickstart/#generation","title":"Generation","text":"Once the model is initialized you can build an outlines.generate
generator. This generator can be called with a prompt directly.
(Outlines Structured Generation Full Documentation)
TextStructured generator = outlines.generate.text(model)\n\nresult = generator(\"Question: What's 2+2? Answer:\", max_tokens=100)\nprint(result)\n# The answer is 4\n\n# Outlines also supports streaming output\nstream = generator.stream(\"What's 2+2?\", max_tokens=4)\nfor i in range(5):\n token = next(stream)\n print(repr(token))\n# '2'\n# '+'\n# '2'\n# ' equals'\n# '4'\n
Along with typical language model generation behavior via, outlines.generate.text
, Outlines supports structured generation, which guarantees the tokens generated by the model will follow a predefined structure. Structures can be defined by a regex pattern, JSON schema, python object type, or a Lark grammar defining a parsable language such as SQL or Python.
Example: using pydantic to enforce a JSON schema
from enum import Enum\nfrom pydantic import BaseModel, constr, conint\n\nclass Character(BaseModel):\n name: constr(max_length=10)\n age: conint(gt=18, lt=99)\n armor: (Enum('Armor', {'leather': 'leather', 'chainmail': 'chainmail', 'plate': 'plate'}))\n strength: conint(gt=1, lt=100)\n\ngenerator = outlines.generate.json(model, Character)\n\ncharacter = generator(\n \"Generate a new character for my awesome game: \"\n + \"name, age (between 1 and 99), armor and strength. \"\n )\nprint(character)\n# Character(name='Zara', age=25, armor=<Armor.leather: 'leather'>, strength=85)\n
"},{"location":"quickstart/#deploy-using-vllm-and-fastapi","title":"Deploy using vLLM and FastAPI","text":"Outlines can be deployed as a LLM service using vLLM and FastAPI. The server supports asynchronous processing of incoming requests, and benefits from the performance of vLLM.
First start the server:
python -m outlines.serve.serve --model=\"microsoft/Phi-3-mini-4k-instruct\"\n
Or you can start the server with Outlines' official Docker image:
docker run -p 8000:8000 outlinesdev/outlines --model=\"microsoft/Phi-3-mini-4k-instruct\"\n
This will by default start a server at http://127.0.0.1:8000
(check what the console says, though). Without the --model
argument set, the OPT-125M model is used.
You can then query the model in shell by passing a prompt and a JSON Schema specification for the structure of the output:
curl http://127.0.0.1:8000/generate \\\n -d '{\n \"prompt\": \"Question: What is a language model? Answer:\",\n \"schema\": {\"type\": \"string\"}\n }'\n
Or use the requests library from another python program. You can read the vLLM documentation for more details.
"},{"location":"quickstart/#utilities","title":"Utilities","text":""},{"location":"quickstart/#prompt-templates","title":"Prompt templates","text":"Prompting can lead to messy code. Outlines' prompt functions are python functions that contain a template for the prompt in their docstring. We use a powerful templating language to allow you to loop over lists, dictionaries, add conditionals, etc. directly from the prompt. When called, a prompt function returns the rendered template:
import outlines\n\n@outlines.prompt\ndef few_shots(instructions, examples, question):\n \"\"\"{{ instructions }}\n\n Examples\n --------\n\n {% for example in examples %}\n Q: {{ example.question }}\n A: {{ example.answer }}\n\n {% endfor %}\n Question\n --------\n\n Q: {{ question }}\n A:\n \"\"\"\n\ninstructions = \"Please answer the following question following the examples\"\nexamples = [\n {\"question\": \"2+2=?\", \"answer\":4},\n {\"question\": \"3+3=?\", \"answer\":6}\n]\nquestion = \"4+4 = ?\"\n\nprompt = few_shots(instructions, examples, question)\nprint(prompt)\n# Please answer the following question following the examples\n\n# Examples\n# --------\n\n# Q: 2+2=?\n# A: 4\n\n# Q: 3+3=?\n# A: 6\n\n# Question\n# --------\n\n# Q: 4+4 = ?\n# A:\n
"},{"location":"quickstart/#outlines-functions","title":"Outlines functions","text":"Once you are done experimenting with a prompt and an output structure, it is useful to be able to encapsulate all of these in a single function that can be called from other parts of the program. This is what outlines.Function
allows you to do:
function.pyCall a functionCall a function stored on GitHub from pydantic import BaseModel\n\nimport outlines\n\n\n@outlines.prompt\ndef tell_a_joke(topic):\n \"\"\"Tell me a joke about {{ topic }}.\"\"\"\n\nclass Joke(BaseModel):\n setup: str\n punchline: str\n\ngenerate_joke = outlines.Function(\n tell_a_joke,\n Joke,\n \"microsoft/Phi-3-mini-4k-instruct\"\n)\n
from .function import generate_joke\n\nresponse = generate_joke(\"baseball\")\n\n# haha\n# Joke(setup='Why was the baseball in a bad mood?', punchline='Because it got hit around a lot.')\n
You can load a function that is stored on a repository on GitHub directly from Outlines. Say Someone
stores a function in joke.py
at the root of the TheirRepo
repository:
import outlines\n\njoke = outlines.Function.from_github(\"Someone/TheirRepo/joke\")\nresponse = joke(\"baseball\")\n
It make it easier for the community to collaborate on the infinite number of use cases enabled by these models!"},{"location":"quickstart/#going-further","title":"Going further","text":"If you need more inspiration you can take a look at the cookbook or watch Remi Louf's AI Engineer World\u2019s Fair Presentation on Outlines. If you have any question, or requests for documentation please reach out to us on GitHub, Twitter or Discord.
"},{"location":"welcome/","title":"Welcome to Outlines!","text":"Outlines is a Python library that allows you to use Large Language Model in a simple and robust way (with structured generation). It is built by .txt, and is already used in production by many companies.
"},{"location":"welcome/#what-models-do-you-support","title":"What models do you support?","text":"We support Openai, but the true power of Outlines is unleashed with Open Source models available via the transformers, llama.cpp, exllama2, mlx-lm and vllm models. If you want to build and maintain an integration with another library, get in touch.
"},{"location":"welcome/#what-are-the-main-features","title":"What are the main features?","text":" -
Make LLMs generate valid JSON
No more invalid JSON outputs, 100% guaranteed
Generate JSON
-
JSON mode for vLLM
Deploy a LLM service using Outlines' JSON structured generation and vLLM
Deploy outlines
-
Make LLMs follow a Regex
Generate text that parses correctly 100% of the time
Guide LLMs
-
Powerful Prompt Templating
Better manage your prompts' complexity with prompt templating
Learn more
"},{"location":"welcome/#why-use-outlines","title":"Why use Outlines?","text":"Outlines is built at .txt by engineers with decades of experience in software engineering, machine learning (Bayesian Statistics and NLP), and compilers. .txt is a VC-backed company fully focused on the topic of structured generation and is committed to make the community benefit from its experience.
We are also open source veterans and have authored/maintained many libraries over the years: the Aesara and Pythological ecosystems, Blackjax and Hy among many others. .
Outlines does not use unnecessary abstractions that tend to get in your way. We have a laser focus on reliable text generation with LLMs, a clear roadmap to push the state of the art in this area and a commitment to clean and robust code.
And last but not least, unlike alternatives, Outlines' structured generation introduces no overhead during inference.
"},{"location":"welcome/#who-is-using-outlines","title":"Who is using Outlines?","text":"Hundreds of organisations and the main LLM serving frameworks (vLLM, TGI, LoRAX, xinference, SGLang) are using Outlines. Some of the prominent companies and organizations that are using Outlines include:
Organizations are included either because they use Outlines as a dependency in a public repository, or because of direct communication between members of the Outlines team and employees at these organizations.
Still not convinced, read what people say about us. And make sure to take a look at what the community is building!
"},{"location":"welcome/#philosophy","title":"Philosophy","text":"Outlines is a library for neural text generation. You can think of it as a more flexible replacement for the generate
method in the transformers library.
Outlines helps developers structure text generation to build robust interfaces with external systems. It provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema.
Outlines provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc.
Outlines is designed as a library that is meant to be compatible the broader ecosystem, not to replace it. We use as few abstractions as possible, and generation can be interleaved with control flow, conditionals, custom Python functions and calls to other libraries.
Outlines is compatible with every auto-regressive model. It only interfaces with models via the next-token logits distribution.
"},{"location":"welcome/#outlines-people","title":"Outlines people","text":"Outlines would not be what it is today without a community of dedicated developers:
"},{"location":"welcome/#acknowledgements","title":"Acknowledgements","text":"Outlines was originally developed at @NormalComputing by @remilouf and @BrandonTWillard. It is now maintained by .txt.
"},{"location":"api/","title":"API Reference","text":""},{"location":"api/guide/","title":"Guide","text":""},{"location":"api/guide/#outlines.fsm.guide.CFGGuide","title":"CFGGuide
","text":" Bases: Guide
Guide to generate text that is in the language of a context-free Lark grammar.
Source code in outlines/fsm/guide.py
class CFGGuide(Guide):\n \"\"\"Guide to generate text that is in the language of a context-free Lark grammar.\"\"\"\n\n def __init__(self, cfg_string: str, tokenizer):\n \"\"\"\n Construct the PartialLark parser and set the empty initial_state (PartialParserState)\n \"\"\"\n warnings.warn(\n \"Outlines' public *community-contributed* CFG structured generation is experimental. \"\n \"Please review https://dottxt-ai.github.io/outlines/latest/reference/generation/cfg#disclaimer\"\n )\n\n self.cfg_string = cfg_string\n self.tokenizer = tokenizer\n self.eos_token_id = self.tokenizer.eos_token_id\n self.parser = PartialLark(\n cfg_string,\n parser=\"lalr\",\n import_paths=[grammars.GRAMMAR_PATH],\n )\n self.initial_state = CFGState(\n parser_state=self.parser.parse(\"\"), prev_token=None\n )\n\n def get_next_instruction(self, state: CFGState) -> Instruction:\n \"\"\"Return the next instruction for guided generation.\n\n Current lazy approach:\n - For each token in the vocabulary\n - create a copy of the parsers state\n - add the tokens to the parsers input text\n - if valid, add token to returned tokens\n\n Further refinements are necessary for performant text processing.\n\n Parameters\n ----------\n state\n The guides current PartialParserState, or None if complete\n\n Returns\n -------\n A `Generate` instance that contains the model and the allowed token ids.\n\n \"\"\"\n\n if state.parser_state is None:\n return Write(torch.tensor([self.eos_token_id]))\n\n valid_tokens = list(\n self.iter_valid_token_ids(state, self.tokenizer.vocabulary.values())\n )\n if len(valid_tokens) == 1:\n return Write(torch.tensor(valid_tokens))\n return Generate(torch.tensor(valid_tokens))\n\n def iter_valid_token_ids(\n self, state: CFGState, candidate_token_ids: list\n ) -> Generator[int, None, None]:\n \"\"\"\n Iterate over the given token_ids and yield those that are valid for the current parser state.\n\n Parameters\n ----------\n parser_state\n The current state of the parser, or None if complete.\n token_ids\n The list of token ids to check for validity.\n\n Yields\n ------\n int\n Valid token ids.\n \"\"\"\n if state.parser_state is None:\n yield self.eos_token_id\n return\n\n for token_id in candidate_token_ids:\n if token_id == self.eos_token_id:\n if self.can_terminate_state(state):\n yield token_id\n else:\n try:\n self._get_parser_state_token_applied(state, int(token_id))\n yield token_id\n except (\n ValueError,\n EOFError,\n UnexpectedToken,\n UnexpectedCharacters,\n DedentError,\n ):\n pass\n\n def get_next_state(self, state: CFGState, token_id: int) -> CFGState:\n \"\"\"\n Update the state of the guide.\n Decode the token_id, and calculate the new parser_state with the token applied.\n\n Parameters\n ----------\n state\n The guides current PartialParserState, or None if complete\n token_id\n The id of the token that was just generated.\n\n Returns\n -------\n The guides new PartialParserState\n\n \"\"\"\n if state.parser_state is None or token_id == self.eos_token_id:\n parser_state = None\n else:\n parser_state = self._get_parser_state_token_applied(state, int(token_id))\n return CFGState(parser_state=parser_state, prev_token=token_id)\n\n def _get_parser_state_token_applied(\n self, state: CFGState, token_id: int\n ) -> PartialParserState:\n \"\"\"\n Don't mutate `parser_state`, copy to protect\n\n Get the token string\n - if first token in generation: tokenizer.decode (no leading whitespace)\n - else: normalized (with possibly leading whitespace)\n\n Don't allow empty (\"\") tokens, raise ValueError\n \"\"\"\n parser_state = copy.copy(state.parser_state) # prevent side effects\n\n # normalize\n if state.prev_token is None:\n new_token_str = self.tokenizer.decode([token_id])[0]\n else:\n prev_token_str = self.tokenizer.decode([[state.prev_token]])[0]\n combined_token_str = self.tokenizer.decode([[state.prev_token, token_id]])[\n 0\n ]\n new_token_str = combined_token_str[len(prev_token_str) :]\n\n if new_token_str == \"\":\n raise ValueError(\"empty next token\")\n\n # update parser with new token\n parser_state.lexer.state.text += new_token_str\n self.parser.parse_from_state(parser_state, is_end=False)\n\n return parser_state\n\n def is_final_state(self, state: CFGState) -> bool:\n # TODO: remove this method, use can_terminate_state and must_terminate_state\n # here and in RegexGuide per https://github.com/dottxt-ai/outlines/issues/885\n return self.can_terminate_state(state)\n\n def can_terminate_state(self, state: CFGState) -> bool:\n \"\"\"Generation is allowed to terminate\"\"\"\n if state.parser_state is not None:\n try:\n copy.copy(state.parser_state).feed_eof()\n except UnexpectedToken:\n return False\n return True\n\n def must_terminate_state(self, state: CFGState) -> bool:\n \"\"\"Generation must terminate, no legal continuations\"\"\"\n return state.parser_state is None or set(state.parser_state.accepts()).issubset(\n {\"$END\"}\n )\n\n def copy(self) -> \"CFGGuide\":\n \"\"\"Create a copy of the Guide.\"\"\"\n return CFGGuide(self.cfg_string, self.tokenizer)\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.__init__","title":"__init__(cfg_string, tokenizer)
","text":"Construct the PartialLark parser and set the empty initial_state (PartialParserState)
Source code in outlines/fsm/guide.py
def __init__(self, cfg_string: str, tokenizer):\n \"\"\"\n Construct the PartialLark parser and set the empty initial_state (PartialParserState)\n \"\"\"\n warnings.warn(\n \"Outlines' public *community-contributed* CFG structured generation is experimental. \"\n \"Please review https://dottxt-ai.github.io/outlines/latest/reference/generation/cfg#disclaimer\"\n )\n\n self.cfg_string = cfg_string\n self.tokenizer = tokenizer\n self.eos_token_id = self.tokenizer.eos_token_id\n self.parser = PartialLark(\n cfg_string,\n parser=\"lalr\",\n import_paths=[grammars.GRAMMAR_PATH],\n )\n self.initial_state = CFGState(\n parser_state=self.parser.parse(\"\"), prev_token=None\n )\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.can_terminate_state","title":"can_terminate_state(state)
","text":"Generation is allowed to terminate
Source code in outlines/fsm/guide.py
def can_terminate_state(self, state: CFGState) -> bool:\n \"\"\"Generation is allowed to terminate\"\"\"\n if state.parser_state is not None:\n try:\n copy.copy(state.parser_state).feed_eof()\n except UnexpectedToken:\n return False\n return True\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.copy","title":"copy()
","text":"Create a copy of the Guide.
Source code in outlines/fsm/guide.py
def copy(self) -> \"CFGGuide\":\n \"\"\"Create a copy of the Guide.\"\"\"\n return CFGGuide(self.cfg_string, self.tokenizer)\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_instruction","title":"get_next_instruction(state)
","text":"Return the next instruction for guided generation.
Current lazy approach: - For each token in the vocabulary - create a copy of the parsers state - add the tokens to the parsers input text - if valid, add token to returned tokens
Further refinements are necessary for performant text processing.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_instruction--parameters","title":"Parameters","text":"state The guides current PartialParserState, or None if complete
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_instruction--returns","title":"Returns","text":"A Generate
instance that contains the model and the allowed token ids.
Source code in outlines/fsm/guide.py
def get_next_instruction(self, state: CFGState) -> Instruction:\n \"\"\"Return the next instruction for guided generation.\n\n Current lazy approach:\n - For each token in the vocabulary\n - create a copy of the parsers state\n - add the tokens to the parsers input text\n - if valid, add token to returned tokens\n\n Further refinements are necessary for performant text processing.\n\n Parameters\n ----------\n state\n The guides current PartialParserState, or None if complete\n\n Returns\n -------\n A `Generate` instance that contains the model and the allowed token ids.\n\n \"\"\"\n\n if state.parser_state is None:\n return Write(torch.tensor([self.eos_token_id]))\n\n valid_tokens = list(\n self.iter_valid_token_ids(state, self.tokenizer.vocabulary.values())\n )\n if len(valid_tokens) == 1:\n return Write(torch.tensor(valid_tokens))\n return Generate(torch.tensor(valid_tokens))\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_state","title":"get_next_state(state, token_id)
","text":"Update the state of the guide. Decode the token_id, and calculate the new parser_state with the token applied.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_state--parameters","title":"Parameters","text":"state The guides current PartialParserState, or None if complete token_id The id of the token that was just generated.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_state--returns","title":"Returns","text":"The guides new PartialParserState
Source code in outlines/fsm/guide.py
def get_next_state(self, state: CFGState, token_id: int) -> CFGState:\n \"\"\"\n Update the state of the guide.\n Decode the token_id, and calculate the new parser_state with the token applied.\n\n Parameters\n ----------\n state\n The guides current PartialParserState, or None if complete\n token_id\n The id of the token that was just generated.\n\n Returns\n -------\n The guides new PartialParserState\n\n \"\"\"\n if state.parser_state is None or token_id == self.eos_token_id:\n parser_state = None\n else:\n parser_state = self._get_parser_state_token_applied(state, int(token_id))\n return CFGState(parser_state=parser_state, prev_token=token_id)\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.iter_valid_token_ids","title":"iter_valid_token_ids(state, candidate_token_ids)
","text":"Iterate over the given token_ids and yield those that are valid for the current parser state.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.iter_valid_token_ids--parameters","title":"Parameters","text":"parser_state The current state of the parser, or None if complete. token_ids The list of token ids to check for validity.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.iter_valid_token_ids--yields","title":"Yields","text":"int Valid token ids.
Source code in outlines/fsm/guide.py
def iter_valid_token_ids(\n self, state: CFGState, candidate_token_ids: list\n) -> Generator[int, None, None]:\n \"\"\"\n Iterate over the given token_ids and yield those that are valid for the current parser state.\n\n Parameters\n ----------\n parser_state\n The current state of the parser, or None if complete.\n token_ids\n The list of token ids to check for validity.\n\n Yields\n ------\n int\n Valid token ids.\n \"\"\"\n if state.parser_state is None:\n yield self.eos_token_id\n return\n\n for token_id in candidate_token_ids:\n if token_id == self.eos_token_id:\n if self.can_terminate_state(state):\n yield token_id\n else:\n try:\n self._get_parser_state_token_applied(state, int(token_id))\n yield token_id\n except (\n ValueError,\n EOFError,\n UnexpectedToken,\n UnexpectedCharacters,\n DedentError,\n ):\n pass\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.must_terminate_state","title":"must_terminate_state(state)
","text":"Generation must terminate, no legal continuations
Source code in outlines/fsm/guide.py
def must_terminate_state(self, state: CFGState) -> bool:\n \"\"\"Generation must terminate, no legal continuations\"\"\"\n return state.parser_state is None or set(state.parser_state.accepts()).issubset(\n {\"$END\"}\n )\n
"},{"location":"api/guide/#outlines.fsm.guide.Guide","title":"Guide
","text":" Bases: Guide
Base definition of a generation guide.
A generation guide defines the behavior of a finite-state machine that guides a text generation procedure. Unlike the DFAs built from regular expressions guides can also emit a Write
instructions which tells the model that it can append a sequence of tokens (or token word) instead of generating it.
Source code in outlines/fsm/guide.py
class Guide(CoreGuide):\n \"\"\"Base definition of a generation guide.\n\n A generation guide defines the behavior of a finite-state machine that guides\n a text generation procedure. Unlike the DFAs built from regular expressions\n guides can also emit a `Write` instructions which tells the model that it can\n append a sequence of tokens (or token word) instead of generating it.\n\n \"\"\"\n\n initial_state: Any\n
"},{"location":"api/guide/#outlines.fsm.guide.RegexGuide","title":"RegexGuide
","text":" Bases: RegexGuide
Guide to generate text in the language of a regular expression. CoreRegexGuide with outlines cache
Source code in outlines/fsm/guide.py
class RegexGuide(CoreRegexGuide):\n \"\"\"\n Guide to generate text in the language of a regular expression.\n CoreRegexGuide with outlines cache\n \"\"\"\n\n @classmethod\n def from_regex(\n cls,\n regex_string: str,\n tokenizer,\n **kwargs,\n ):\n return super().from_regex(\n regex_string,\n tokenizer,\n _create_states_mapping=cached_create_states_mapping,\n **kwargs,\n )\n
"},{"location":"api/guide/#outlines.fsm.guide.StopAtEOSGuide","title":"StopAtEOSGuide
","text":" Bases: Guide
Guide to generate tokens until the EOS token has been generated.
Source code in outlines/fsm/guide.py
class StopAtEOSGuide(Guide):\n \"\"\"Guide to generate tokens until the EOS token has been generated.\"\"\"\n\n final_state = 1\n start_state = 0 # TODO: remove start_state, use only initial_state\n initial_state = 0\n\n def __init__(self, tokenizer: \"Tokenizer\"):\n \"\"\"Initialize the generation guide.\n\n model\n The logit generator used to generate the next token.\n\n \"\"\"\n self.eos_token_id = tokenizer.eos_token_id\n self.vocabulary = tokenizer.vocabulary.values()\n\n def get_next_instruction(self, state: int) -> Instruction:\n if self.is_final_state(state):\n return Write([self.eos_token_id])\n return Generate(None)\n\n def get_next_state(self, state: int, token_id: int) -> int:\n if token_id == self.eos_token_id or state == self.final_state:\n return self.final_state\n\n return self.initial_state\n\n def is_final_state(self, state: int):\n return state == self.final_state\n\n def copy(self):\n return self\n
"},{"location":"api/guide/#outlines.fsm.guide.StopAtEOSGuide.__init__","title":"__init__(tokenizer)
","text":"Initialize the generation guide.
model The logit generator used to generate the next token.
Source code in outlines/fsm/guide.py
def __init__(self, tokenizer: \"Tokenizer\"):\n \"\"\"Initialize the generation guide.\n\n model\n The logit generator used to generate the next token.\n\n \"\"\"\n self.eos_token_id = tokenizer.eos_token_id\n self.vocabulary = tokenizer.vocabulary.values()\n
"},{"location":"api/json_schema/","title":"Json schema","text":""},{"location":"api/json_schema/#outlines.fsm.json_schema.convert_json_schema_to_str","title":"convert_json_schema_to_str(json_schema)
","text":"Convert a JSON schema to a string.
"},{"location":"api/json_schema/#outlines.fsm.json_schema.convert_json_schema_to_str--parameters","title":"Parameters","text":"json_schema The JSON schema.
"},{"location":"api/json_schema/#outlines.fsm.json_schema.convert_json_schema_to_str--returns","title":"Returns","text":"str The JSON schema converted to a string.
"},{"location":"api/json_schema/#outlines.fsm.json_schema.convert_json_schema_to_str--raises","title":"Raises","text":"ValueError If the schema is not a dictionary, a string or a Pydantic class.
Source code in outlines/fsm/json_schema.py
def convert_json_schema_to_str(json_schema: Union[dict, str, Type[BaseModel]]) -> str:\n \"\"\"Convert a JSON schema to a string.\n\n Parameters\n ----------\n json_schema\n The JSON schema.\n\n Returns\n -------\n str\n The JSON schema converted to a string.\n\n Raises\n ------\n ValueError\n If the schema is not a dictionary, a string or a Pydantic class.\n \"\"\"\n if isinstance(json_schema, dict):\n schema_str = json.dumps(json_schema)\n elif isinstance(json_schema, str):\n schema_str = json_schema\n elif issubclass(json_schema, BaseModel):\n schema_str = json.dumps(json_schema.model_json_schema())\n else:\n raise ValueError(\n f\"Cannot parse schema {json_schema}. The schema must be either \"\n + \"a Pydantic class, a dictionary or a string that contains the JSON \"\n + \"schema specification\"\n )\n return schema_str\n
"},{"location":"api/json_schema/#outlines.fsm.json_schema.get_schema_from_signature","title":"get_schema_from_signature(fn)
","text":"Turn a function signature into a JSON schema.
Every JSON object valid to the output JSON Schema can be passed to fn
using the ** unpacking syntax.
Source code in outlines/fsm/json_schema.py
def get_schema_from_signature(fn: Callable) -> dict:\n \"\"\"Turn a function signature into a JSON schema.\n\n Every JSON object valid to the output JSON Schema can be passed\n to `fn` using the ** unpacking syntax.\n\n \"\"\"\n signature = inspect.signature(fn)\n arguments = {}\n for name, arg in signature.parameters.items():\n if arg.annotation == inspect._empty:\n raise ValueError(\"Each argument must have a type annotation\")\n else:\n arguments[name] = (arg.annotation, ...)\n\n try:\n fn_name = fn.__name__\n except Exception as e:\n fn_name = \"Arguments\"\n warnings.warn(\n f\"The function name could not be determined. Using default name 'Arguments' instead. For debugging, here is exact error:\\n{e}\",\n category=UserWarning,\n )\n model = create_model(fn_name, **arguments)\n\n return model.model_json_schema()\n
"},{"location":"api/models/","title":"Models","text":"Integration with OpenAI's API.
"},{"location":"api/models/#outlines.models.transformers.TransformerTokenizer","title":"TransformerTokenizer
","text":" Bases: Tokenizer
Represents a tokenizer for models in the transformers
library.
Source code in outlines/models/transformers.py
class TransformerTokenizer(Tokenizer):\n \"\"\"Represents a tokenizer for models in the `transformers` library.\"\"\"\n\n def __init__(self, tokenizer: \"PreTrainedTokenizer\", **kwargs):\n self.tokenizer = tokenizer\n self.eos_token_id = self.tokenizer.eos_token_id\n self.eos_token = self.tokenizer.eos_token\n\n if self.tokenizer.pad_token_id is None:\n self.tokenizer.pad_token_id = self.tokenizer.eos_token_id\n self.pad_token_id = self.eos_token_id\n else:\n self.pad_token_id = self.tokenizer.pad_token_id\n self.pad_token = self.tokenizer.pad_token\n\n self.special_tokens = set(self.tokenizer.all_special_tokens)\n\n self.vocabulary = self.tokenizer.get_vocab()\n self.is_llama = isinstance(self.tokenizer, get_llama_tokenizer_types())\n\n def encode(\n self, prompt: Union[str, List[str]], **kwargs\n ) -> Tuple[\"torch.LongTensor\", \"torch.LongTensor\"]:\n kwargs[\"padding\"] = True\n kwargs[\"return_tensors\"] = \"pt\"\n output = self.tokenizer(prompt, **kwargs)\n return output[\"input_ids\"], output[\"attention_mask\"]\n\n def decode(self, token_ids: \"torch.LongTensor\") -> List[str]:\n text = self.tokenizer.batch_decode(token_ids, skip_special_tokens=True)\n return text\n\n def convert_token_to_string(self, token: str) -> str:\n from transformers.file_utils import SPIECE_UNDERLINE\n\n string = self.tokenizer.convert_tokens_to_string([token])\n\n if self.is_llama:\n # A hack to handle missing spaces to HF's Llama tokenizers\n if token.startswith(SPIECE_UNDERLINE) or token == \"<0x20>\":\n return \" \" + string\n\n return string\n\n def __eq__(self, other):\n if isinstance(other, type(self)):\n if hasattr(self, \"model_name\") and hasattr(self, \"kwargs\"):\n return (\n other.model_name == self.model_name and other.kwargs == self.kwargs\n )\n else:\n return other.tokenizer == self.tokenizer\n return NotImplemented\n\n def __hash__(self):\n from datasets.fingerprint import Hasher\n\n return hash(Hasher.hash(self.tokenizer))\n\n def __getstate__(self):\n state = {\"tokenizer\": self.tokenizer}\n return state\n\n def __setstate__(self, state):\n self.__init__(state[\"tokenizer\"])\n
"},{"location":"api/models/#outlines.models.transformers.Transformers","title":"Transformers
","text":"Represents a transformers
model.
Source code in outlines/models/transformers.py
class Transformers:\n \"\"\"Represents a `transformers` model.\"\"\"\n\n def __init__(\n self,\n model: \"PreTrainedModel\",\n tokenizer: \"PreTrainedTokenizer\",\n ):\n self.model = model\n self.tokenizer = TransformerTokenizer(tokenizer)\n\n def forward(\n self,\n input_ids: \"torch.LongTensor\",\n attention_mask: \"torch.LongTensor\",\n past_key_values: Optional[Tuple] = None,\n ) -> Tuple[\"torch.FloatTensor\", Optional[KVCacheType]]:\n \"\"\"Compute a forward pass through the transformer model.\n\n Parameters\n ----------\n input_ids\n The input token ids. Must be one or two dimensional.\n attention_mask\n The attention mask. Must be one or two dimensional.\n past_key_values\n A tuple of tuples containing the cached key and value tensors for each\n attention head.\n\n Returns\n -------\n The computed logits and the new cached key and value tensors.\n\n \"\"\"\n try:\n import torch\n except ImportError:\n ImportError(\n \"The `torch` library needs to be installed to use `transformers` models.\"\n )\n assert 0 < input_ids.ndim < 3\n\n if past_key_values:\n input_ids = input_ids[..., -1].unsqueeze(-1)\n\n with torch.inference_mode():\n output = self.model(\n input_ids,\n attention_mask=attention_mask,\n return_dict=True,\n output_attentions=False,\n output_hidden_states=False,\n past_key_values=past_key_values,\n )\n\n return output.logits, output.past_key_values\n\n def __call__(\n self,\n input_ids: \"torch.LongTensor\",\n attention_mask: \"torch.LongTensor\",\n past_key_values: Optional[Tuple] = None,\n ) -> \"torch.FloatTensor\":\n logits, kv_cache = self.forward(input_ids, attention_mask, past_key_values)\n next_token_logits = logits[..., -1, :]\n\n return next_token_logits, kv_cache\n\n def generate(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n ) -> Union[str, List[str], List[List[str]]]:\n \"\"\"Generate text using `transformers`.\n\n Arguments\n ---------\n prompts\n A prompt or list of prompts.\n generation_parameters\n An instance of `GenerationParameters` that contains the prompt,\n the maximum number of tokens, stop sequences and seed. All the\n arguments to `SequenceGeneratorAdapter`'s `__cal__` method.\n logits_processor\n The logits processor to use when generating text.\n sampling_parameters\n An instance of `SamplingParameters`, a dataclass that contains\n the name of the sampler to use and related parameters as available\n in Outlines.\n\n Returns\n -------\n The generated text\n \"\"\"\n if isinstance(prompts, str):\n # convert to 2d\n input_ids, attention_mask = self.tokenizer.encode([prompts])\n else:\n input_ids, attention_mask = self.tokenizer.encode(prompts)\n\n inputs = {\n \"input_ids\": input_ids.to(self.model.device),\n \"attention_mask\": attention_mask.to(self.model.device),\n }\n if (\n \"attention_mask\"\n not in inspect.signature(self.model.forward).parameters.keys()\n ):\n del inputs[\"attention_mask\"]\n\n generation_kwargs = self._get_generation_kwargs(\n prompts,\n generation_parameters,\n logits_processor,\n sampling_parameters,\n )\n generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)\n\n # if single str input and single sample per input, convert to a 1D output\n if isinstance(prompts, str):\n generated_ids = generated_ids.squeeze(0)\n\n return self._decode_generation(generated_ids)\n\n def stream(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n ) -> Iterator[Union[str, List[str]]]:\n \"\"\"\n Temporary stream stand-in which implements stream() signature\n and equivalent behaviour but isn't yielded until generation completes.\n\n TODO: implement following completion of https://github.com/huggingface/transformers/issues/30810\n \"\"\"\n if isinstance(prompts, str):\n # convert to 2d\n input_ids, attention_mask = self.tokenizer.encode([prompts])\n else:\n input_ids, attention_mask = self.tokenizer.encode(prompts)\n inputs = {\n \"input_ids\": input_ids.to(self.model.device),\n \"attention_mask\": attention_mask.to(self.model.device),\n }\n if (\n \"attention_mask\"\n not in inspect.signature(self.model.forward).parameters.keys()\n ):\n del inputs[\"attention_mask\"]\n\n generation_kwargs = self._get_generation_kwargs(\n prompts,\n generation_parameters,\n logits_processor,\n sampling_parameters,\n )\n generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)\n\n # if single str input and single sample per input, convert to a 1D output\n if isinstance(prompts, str):\n generated_ids = generated_ids.squeeze(0)\n\n for i in range(generated_ids.size(-1)):\n output_group_ids = generated_ids.select(-1, i).unsqueeze(-1)\n yield self._decode_generation(output_group_ids)\n\n def _get_generation_kwargs(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n ) -> dict:\n \"\"\"\n Conert outlines generation parameters into model.generate kwargs\n \"\"\"\n from transformers import GenerationConfig, LogitsProcessorList, set_seed\n\n max_new_tokens, stop_at, seed = dataclasses.astuple(generation_parameters)\n sampler, num_samples, top_p, top_k, temperature = dataclasses.astuple(\n sampling_parameters\n )\n if max_new_tokens is None:\n max_new_tokens = int(2**30)\n\n # global seed, not desirable\n if seed is not None:\n set_seed(seed)\n\n if logits_processor is not None:\n logits_processor_list = LogitsProcessorList([logits_processor])\n else:\n logits_processor_list = None\n\n generation_config = GenerationConfig(\n max_new_tokens=max_new_tokens,\n stop_strings=stop_at,\n num_return_sequences=(num_samples or 1),\n top_p=top_p,\n top_k=top_k,\n temperature=temperature,\n do_sample=(sampler == \"multinomial\"),\n num_beams=(num_samples if sampler == \"beam_search\" else 1),\n eos_token_id=self.tokenizer.eos_token_id,\n pad_token_id=self.tokenizer.pad_token_id,\n )\n\n return dict(\n logits_processor=logits_processor_list,\n generation_config=generation_config,\n tokenizer=self.tokenizer.tokenizer,\n )\n\n def _generate_output_seq(\n self, prompts, inputs, generation_config, **generation_kwargs\n ):\n input_ids = inputs[\"input_ids\"]\n output_ids = self.model.generate(\n **inputs, generation_config=generation_config, **generation_kwargs\n )\n\n # encoder-decoder returns output_ids only, decoder-only returns full seq ids\n if self.model.config.is_encoder_decoder:\n generated_ids = output_ids\n else:\n generated_ids = output_ids[:, input_ids.shape[1] :]\n\n # if batch list inputs AND multiple samples per input, convert generated_id to 3D view\n num_samples = generation_config.num_return_sequences or 1\n\n if num_samples > 1 and isinstance(prompts, list):\n batch_size = input_ids.size(0)\n num_return_sequences = generation_config.num_return_sequences or 1\n generated_ids = generated_ids.view(batch_size, num_return_sequences, -1)\n\n return generated_ids\n\n def _decode_generation(self, generated_ids: \"torch.Tensor\"):\n if len(generated_ids.shape) == 1:\n return self.tokenizer.decode([generated_ids])[0]\n elif len(generated_ids.shape) == 2:\n return self.tokenizer.decode(generated_ids)\n elif len(generated_ids.shape) == 3:\n return [\n self.tokenizer.decode(generated_ids[i])\n for i in range(len(generated_ids))\n ]\n else:\n raise TypeError(\n f\"Generated outputs aren't 1D, 2D or 3D, but instead are {generated_ids.shape}\"\n )\n
"},{"location":"api/models/#outlines.models.transformers.Transformers.forward","title":"forward(input_ids, attention_mask, past_key_values=None)
","text":"Compute a forward pass through the transformer model.
"},{"location":"api/models/#outlines.models.transformers.Transformers.forward--parameters","title":"Parameters","text":"input_ids The input token ids. Must be one or two dimensional. attention_mask The attention mask. Must be one or two dimensional. past_key_values A tuple of tuples containing the cached key and value tensors for each attention head.
"},{"location":"api/models/#outlines.models.transformers.Transformers.forward--returns","title":"Returns","text":"The computed logits and the new cached key and value tensors.
Source code in outlines/models/transformers.py
def forward(\n self,\n input_ids: \"torch.LongTensor\",\n attention_mask: \"torch.LongTensor\",\n past_key_values: Optional[Tuple] = None,\n) -> Tuple[\"torch.FloatTensor\", Optional[KVCacheType]]:\n \"\"\"Compute a forward pass through the transformer model.\n\n Parameters\n ----------\n input_ids\n The input token ids. Must be one or two dimensional.\n attention_mask\n The attention mask. Must be one or two dimensional.\n past_key_values\n A tuple of tuples containing the cached key and value tensors for each\n attention head.\n\n Returns\n -------\n The computed logits and the new cached key and value tensors.\n\n \"\"\"\n try:\n import torch\n except ImportError:\n ImportError(\n \"The `torch` library needs to be installed to use `transformers` models.\"\n )\n assert 0 < input_ids.ndim < 3\n\n if past_key_values:\n input_ids = input_ids[..., -1].unsqueeze(-1)\n\n with torch.inference_mode():\n output = self.model(\n input_ids,\n attention_mask=attention_mask,\n return_dict=True,\n output_attentions=False,\n output_hidden_states=False,\n past_key_values=past_key_values,\n )\n\n return output.logits, output.past_key_values\n
"},{"location":"api/models/#outlines.models.transformers.Transformers.generate","title":"generate(prompts, generation_parameters, logits_processor, sampling_parameters)
","text":"Generate text using transformers
.
"},{"location":"api/models/#outlines.models.transformers.Transformers.generate--arguments","title":"Arguments","text":"prompts A prompt or list of prompts. generation_parameters An instance of GenerationParameters
that contains the prompt, the maximum number of tokens, stop sequences and seed. All the arguments to SequenceGeneratorAdapter
's __cal__
method. logits_processor The logits processor to use when generating text. sampling_parameters An instance of SamplingParameters
, a dataclass that contains the name of the sampler to use and related parameters as available in Outlines.
"},{"location":"api/models/#outlines.models.transformers.Transformers.generate--returns","title":"Returns","text":"The generated text
Source code in outlines/models/transformers.py
def generate(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n) -> Union[str, List[str], List[List[str]]]:\n \"\"\"Generate text using `transformers`.\n\n Arguments\n ---------\n prompts\n A prompt or list of prompts.\n generation_parameters\n An instance of `GenerationParameters` that contains the prompt,\n the maximum number of tokens, stop sequences and seed. All the\n arguments to `SequenceGeneratorAdapter`'s `__cal__` method.\n logits_processor\n The logits processor to use when generating text.\n sampling_parameters\n An instance of `SamplingParameters`, a dataclass that contains\n the name of the sampler to use and related parameters as available\n in Outlines.\n\n Returns\n -------\n The generated text\n \"\"\"\n if isinstance(prompts, str):\n # convert to 2d\n input_ids, attention_mask = self.tokenizer.encode([prompts])\n else:\n input_ids, attention_mask = self.tokenizer.encode(prompts)\n\n inputs = {\n \"input_ids\": input_ids.to(self.model.device),\n \"attention_mask\": attention_mask.to(self.model.device),\n }\n if (\n \"attention_mask\"\n not in inspect.signature(self.model.forward).parameters.keys()\n ):\n del inputs[\"attention_mask\"]\n\n generation_kwargs = self._get_generation_kwargs(\n prompts,\n generation_parameters,\n logits_processor,\n sampling_parameters,\n )\n generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)\n\n # if single str input and single sample per input, convert to a 1D output\n if isinstance(prompts, str):\n generated_ids = generated_ids.squeeze(0)\n\n return self._decode_generation(generated_ids)\n
"},{"location":"api/models/#outlines.models.transformers.Transformers.stream","title":"stream(prompts, generation_parameters, logits_processor, sampling_parameters)
","text":"Temporary stream stand-in which implements stream() signature and equivalent behaviour but isn't yielded until generation completes.
TODO: implement following completion of https://github.com/huggingface/transformers/issues/30810
Source code in outlines/models/transformers.py
def stream(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n) -> Iterator[Union[str, List[str]]]:\n \"\"\"\n Temporary stream stand-in which implements stream() signature\n and equivalent behaviour but isn't yielded until generation completes.\n\n TODO: implement following completion of https://github.com/huggingface/transformers/issues/30810\n \"\"\"\n if isinstance(prompts, str):\n # convert to 2d\n input_ids, attention_mask = self.tokenizer.encode([prompts])\n else:\n input_ids, attention_mask = self.tokenizer.encode(prompts)\n inputs = {\n \"input_ids\": input_ids.to(self.model.device),\n \"attention_mask\": attention_mask.to(self.model.device),\n }\n if (\n \"attention_mask\"\n not in inspect.signature(self.model.forward).parameters.keys()\n ):\n del inputs[\"attention_mask\"]\n\n generation_kwargs = self._get_generation_kwargs(\n prompts,\n generation_parameters,\n logits_processor,\n sampling_parameters,\n )\n generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)\n\n # if single str input and single sample per input, convert to a 1D output\n if isinstance(prompts, str):\n generated_ids = generated_ids.squeeze(0)\n\n for i in range(generated_ids.size(-1)):\n output_group_ids = generated_ids.select(-1, i).unsqueeze(-1)\n yield self._decode_generation(output_group_ids)\n
"},{"location":"api/models/#outlines.models.transformers.get_llama_tokenizer_types","title":"get_llama_tokenizer_types()
","text":"Get all the Llama tokenizer types/classes that need work-arounds.
When they can't be imported, a dummy class is created.
Source code in outlines/models/transformers.py
def get_llama_tokenizer_types():\n \"\"\"Get all the Llama tokenizer types/classes that need work-arounds.\n\n When they can't be imported, a dummy class is created.\n\n \"\"\"\n try:\n from transformers.models.llama import LlamaTokenizer\n except ImportError:\n\n class LlamaTokenizer: # type: ignore\n pass\n\n try:\n from transformers.models.llama import LlamaTokenizerFast\n except ImportError:\n\n class LlamaTokenizerFast: # type: ignore\n pass\n\n try:\n from transformers.models.code_llama import CodeLlamaTokenizer\n except ImportError:\n\n class CodeLlamaTokenizer: # type: ignore\n pass\n\n try:\n from transformers.models.code_llama import CodeLlamaTokenizerFast\n except ImportError:\n\n class CodeLlamaTokenizerFast: # type: ignore\n pass\n\n return (\n LlamaTokenizer,\n LlamaTokenizerFast,\n CodeLlamaTokenizer,\n CodeLlamaTokenizerFast,\n )\n
"},{"location":"api/models/#outlines.models.transformers.transformers","title":"transformers(model_name, device=None, model_kwargs={}, tokenizer_kwargs={}, model_class=None, tokenizer_class=None)
","text":"Instantiate a model from the transformers
library and its tokenizer.
"},{"location":"api/models/#outlines.models.transformers.transformers--parameters","title":"Parameters","text":"model_name The name of the model as listed on Hugging Face's model page. device The device(s) on which the model should be loaded. This overrides the device_map
entry in model_kwargs
when provided. model_kwargs A dictionary that contains the keyword arguments to pass to the from_pretrained
method when loading the model. tokenizer_kwargs A dictionary that contains the keyword arguments to pass to the from_pretrained
method when loading the tokenizer.
"},{"location":"api/models/#outlines.models.transformers.transformers--returns","title":"Returns","text":"A TransformersModel
model instance.
Source code in outlines/models/transformers.py
def transformers(\n model_name: str,\n device: Optional[str] = None,\n model_kwargs: dict = {},\n tokenizer_kwargs: dict = {},\n model_class=None,\n tokenizer_class=None,\n):\n \"\"\"Instantiate a model from the `transformers` library and its tokenizer.\n\n Parameters\n ----------\n model_name\n The name of the model as listed on Hugging Face's model page.\n device\n The device(s) on which the model should be loaded. This overrides\n the `device_map` entry in `model_kwargs` when provided.\n model_kwargs\n A dictionary that contains the keyword arguments to pass to the\n `from_pretrained` method when loading the model.\n tokenizer_kwargs\n A dictionary that contains the keyword arguments to pass to the\n `from_pretrained` method when loading the tokenizer.\n\n Returns\n -------\n A `TransformersModel` model instance.\n\n \"\"\"\n if model_class is None or tokenizer_class is None:\n try:\n from transformers import AutoModelForCausalLM, AutoTokenizer\n except ImportError:\n raise ImportError(\n \"The `transformers` library needs to be installed in order to use `transformers` models.\"\n )\n if model_class is None:\n model_class = AutoModelForCausalLM\n if tokenizer_class is None:\n tokenizer_class = AutoTokenizer\n\n if device is not None:\n model_kwargs[\"device_map\"] = device\n\n model = model_class.from_pretrained(model_name, **model_kwargs)\n\n tokenizer_kwargs.setdefault(\"padding_side\", \"left\")\n tokenizer = tokenizer_class.from_pretrained(model_name, **tokenizer_kwargs)\n\n return Transformers(model, tokenizer)\n
"},{"location":"api/models/#outlines.models.openai.OpenAI","title":"OpenAI
","text":"An object that represents the OpenAI API.
Source code in outlines/models/openai.py
class OpenAI:\n \"\"\"An object that represents the OpenAI API.\"\"\"\n\n def __init__(\n self,\n client,\n config,\n system_prompt: Optional[str] = None,\n ):\n \"\"\"Create an `OpenAI` instance.\n\n This class supports the standard OpenAI API, the Azure OpeanAI API as\n well as compatible APIs that rely on the OpenAI client.\n\n Parameters\n ----------\n client\n An instance of the API's async client.\n config\n An instance of `OpenAIConfig`. Can be useful to specify some\n parameters that cannot be set by calling this class' methods.\n \"\"\"\n\n self.client = client\n self.config = config\n\n # We count the total number of prompt and generated tokens as returned\n # by the OpenAI API, summed over all the requests performed with this\n # model instance.\n self.prompt_tokens = 0\n self.completion_tokens = 0\n\n self.format_sequence = lambda seq: seq\n\n def __call__(\n self,\n prompt: Union[str, List[str]],\n max_tokens: Optional[int] = None,\n stop_at: Optional[Union[List[str], str]] = None,\n *,\n system_prompt: Optional[str] = None,\n temperature: Optional[float] = None,\n samples: Optional[int] = None,\n ) -> np.ndarray:\n \"\"\"Call the OpenAI API to generate text.\n\n Parameters\n ----------\n prompt\n A string or list of strings that will be used to prompt the model\n max_tokens\n The maximum number of tokens to generate\n stop_at\n A string or array of strings which, such that the generation stops\n when they are generated.\n system_prompt\n The content of the system message that precedes the user's prompt.\n temperature\n The value of the temperature used to sample tokens\n samples\n The number of completions to generate for each prompt\n stop_at\n Up to 4 words where the API will stop the completion.\n\n \"\"\"\n if max_tokens is None:\n max_tokens = self.config.max_tokens\n if stop_at is None:\n stop_at = self.config.stop\n if temperature is None:\n temperature = self.config.temperature\n if samples is None:\n samples = self.config.n\n\n config = replace(self.config, max_tokens=max_tokens, temperature=temperature, n=samples, stop=stop_at) # type: ignore\n\n response, prompt_tokens, completion_tokens = generate_chat(\n prompt, system_prompt, self.client, config\n )\n self.prompt_tokens += prompt_tokens\n self.completion_tokens += completion_tokens\n\n return self.format_sequence(response)\n\n def stream(self, *args, **kwargs):\n raise NotImplementedError(\n \"Streaming is currently not supported for the OpenAI API\"\n )\n\n def new_with_replacements(self, **kwargs):\n new_instance = copy.copy(self)\n new_instance.config = replace(new_instance.config, **kwargs)\n return new_instance\n\n def __str__(self):\n return self.__class__.__name__ + \" API\"\n\n def __repr__(self):\n return str(self.config)\n
"},{"location":"api/models/#outlines.models.openai.OpenAI.__call__","title":"__call__(prompt, max_tokens=None, stop_at=None, *, system_prompt=None, temperature=None, samples=None)
","text":"Call the OpenAI API to generate text.
"},{"location":"api/models/#outlines.models.openai.OpenAI.__call__--parameters","title":"Parameters","text":"prompt A string or list of strings that will be used to prompt the model max_tokens The maximum number of tokens to generate stop_at A string or array of strings which, such that the generation stops when they are generated. system_prompt The content of the system message that precedes the user's prompt. temperature The value of the temperature used to sample tokens samples The number of completions to generate for each prompt stop_at Up to 4 words where the API will stop the completion.
Source code in outlines/models/openai.py
def __call__(\n self,\n prompt: Union[str, List[str]],\n max_tokens: Optional[int] = None,\n stop_at: Optional[Union[List[str], str]] = None,\n *,\n system_prompt: Optional[str] = None,\n temperature: Optional[float] = None,\n samples: Optional[int] = None,\n) -> np.ndarray:\n \"\"\"Call the OpenAI API to generate text.\n\n Parameters\n ----------\n prompt\n A string or list of strings that will be used to prompt the model\n max_tokens\n The maximum number of tokens to generate\n stop_at\n A string or array of strings which, such that the generation stops\n when they are generated.\n system_prompt\n The content of the system message that precedes the user's prompt.\n temperature\n The value of the temperature used to sample tokens\n samples\n The number of completions to generate for each prompt\n stop_at\n Up to 4 words where the API will stop the completion.\n\n \"\"\"\n if max_tokens is None:\n max_tokens = self.config.max_tokens\n if stop_at is None:\n stop_at = self.config.stop\n if temperature is None:\n temperature = self.config.temperature\n if samples is None:\n samples = self.config.n\n\n config = replace(self.config, max_tokens=max_tokens, temperature=temperature, n=samples, stop=stop_at) # type: ignore\n\n response, prompt_tokens, completion_tokens = generate_chat(\n prompt, system_prompt, self.client, config\n )\n self.prompt_tokens += prompt_tokens\n self.completion_tokens += completion_tokens\n\n return self.format_sequence(response)\n
"},{"location":"api/models/#outlines.models.openai.OpenAI.__init__","title":"__init__(client, config, system_prompt=None)
","text":"Create an OpenAI
instance.
This class supports the standard OpenAI API, the Azure OpeanAI API as well as compatible APIs that rely on the OpenAI client.
"},{"location":"api/models/#outlines.models.openai.OpenAI.__init__--parameters","title":"Parameters","text":"client An instance of the API's async client. config An instance of OpenAIConfig
. Can be useful to specify some parameters that cannot be set by calling this class' methods.
Source code in outlines/models/openai.py
def __init__(\n self,\n client,\n config,\n system_prompt: Optional[str] = None,\n):\n \"\"\"Create an `OpenAI` instance.\n\n This class supports the standard OpenAI API, the Azure OpeanAI API as\n well as compatible APIs that rely on the OpenAI client.\n\n Parameters\n ----------\n client\n An instance of the API's async client.\n config\n An instance of `OpenAIConfig`. Can be useful to specify some\n parameters that cannot be set by calling this class' methods.\n \"\"\"\n\n self.client = client\n self.config = config\n\n # We count the total number of prompt and generated tokens as returned\n # by the OpenAI API, summed over all the requests performed with this\n # model instance.\n self.prompt_tokens = 0\n self.completion_tokens = 0\n\n self.format_sequence = lambda seq: seq\n
"},{"location":"api/models/#outlines.models.openai.OpenAIConfig","title":"OpenAIConfig
dataclass
","text":"Represents the parameters of the OpenAI API.
The information was last fetched on 2023/11/20. We document below the properties that are specific to the OpenAI API. Not all these properties are supported by Outlines.
"},{"location":"api/models/#outlines.models.openai.OpenAIConfig--properties","title":"Properties","text":"model The name of the model. Available models can be found on OpenAI's website. frequence_penalty Number between 2.0 and -2.0. Positive values penalize new tokens based on their existing frequency in the text, logit_bias Modifies the likelihood of specified tokens to appear in the completion. Number between -100 (forbid) and +100 (only allows). n The number of completions to return for each prompt. presence_penalty Similar to frequency penalty. response_format Specifies the format the model must output. {\"type\": \"json_object\"}
enables JSON mode. seed Two completions with the same seed
value should return the same completion. This is however not guaranteed. stop Up to 4 words where the API will stop the completion. temperature Number between 0 and 2. Higher values make the output more random, while lower values make it more deterministic. top_p Number between 0 and 1. Parameter for nucleus sampling. user A unique identifier for the end-user.
Source code in outlines/models/openai.py
@dataclass(frozen=True)\nclass OpenAIConfig:\n \"\"\"Represents the parameters of the OpenAI API.\n\n The information was last fetched on 2023/11/20. We document below the\n properties that are specific to the OpenAI API. Not all these properties are\n supported by Outlines.\n\n Properties\n ----------\n model\n The name of the model. Available models can be found on OpenAI's website.\n frequence_penalty\n Number between 2.0 and -2.0. Positive values penalize new tokens based on\n their existing frequency in the text,\n logit_bias\n Modifies the likelihood of specified tokens to appear in the completion.\n Number between -100 (forbid) and +100 (only allows).\n n\n The number of completions to return for each prompt.\n presence_penalty\n Similar to frequency penalty.\n response_format\n Specifies the format the model must output. `{\"type\": \"json_object\"}`\n enables JSON mode.\n seed\n Two completions with the same `seed` value should return the same\n completion. This is however not guaranteed.\n stop\n Up to 4 words where the API will stop the completion.\n temperature\n Number between 0 and 2. Higher values make the output more random, while\n lower values make it more deterministic.\n top_p\n Number between 0 and 1. Parameter for nucleus sampling.\n user\n A unique identifier for the end-user.\n\n \"\"\"\n\n model: str = \"\"\n frequency_penalty: float = 0\n logit_bias: Dict[int, int] = field(default_factory=dict)\n max_tokens: Optional[int] = None\n n: int = 1\n presence_penalty: float = 0\n response_format: Optional[Dict[str, str]] = None\n seed: Optional[int] = None\n stop: Optional[Union[str, List[str]]] = None\n temperature: float = 1.0\n top_p: int = 1\n user: str = field(default_factory=str)\n
"},{"location":"api/models/#outlines.models.openai.error_handler","title":"error_handler(api_call_fn)
","text":"Handle OpenAI API errors and missing API key.
Source code in outlines/models/openai.py
def error_handler(api_call_fn: Callable) -> Callable:\n \"\"\"Handle OpenAI API errors and missing API key.\"\"\"\n\n def call(*args, **kwargs):\n import openai\n\n try:\n return api_call_fn(*args, **kwargs)\n except (\n openai.APITimeoutError,\n openai.InternalServerError,\n openai.RateLimitError,\n ) as e:\n raise OSError(f\"Could not connect to the OpenAI API: {e}\")\n except (\n openai.AuthenticationError,\n openai.BadRequestError,\n openai.ConflictError,\n openai.PermissionDeniedError,\n openai.NotFoundError,\n openai.UnprocessableEntityError,\n ) as e:\n raise e\n\n return call\n
"},{"location":"api/models/#outlines.models.openai.generate_chat","title":"generate_chat(prompt, system_prompt, client, config)
async
","text":"Call OpenAI's Chat Completion API.
"},{"location":"api/models/#outlines.models.openai.generate_chat--parameters","title":"Parameters","text":"prompt The prompt we use to start the generation. Passed to the model with the \"user\" role. system_prompt The system prompt, passed to the model with the \"system\" role before the prompt. client The API client config An OpenAIConfig
instance.
"},{"location":"api/models/#outlines.models.openai.generate_chat--returns","title":"Returns","text":"A tuple that contains the model's response(s) and usage statistics.
Source code in outlines/models/openai.py
@functools.partial(vectorize, signature=\"(),(),(),()->(s),(),()\")\nasync def generate_chat(\n prompt: str,\n system_prompt: Union[str, None],\n client,\n config: OpenAIConfig,\n) -> Tuple[np.ndarray, int, int]:\n \"\"\"Call OpenAI's Chat Completion API.\n\n Parameters\n ----------\n prompt\n The prompt we use to start the generation. Passed to the model\n with the \"user\" role.\n system_prompt\n The system prompt, passed to the model with the \"system\" role\n before the prompt.\n client\n The API client\n config\n An `OpenAIConfig` instance.\n\n Returns\n -------\n A tuple that contains the model's response(s) and usage statistics.\n\n \"\"\"\n\n @error_handler\n @cache()\n async def call_api(prompt, system_prompt, config):\n responses = await client.chat.completions.create(\n messages=system_message + user_message,\n **asdict(config), # type: ignore\n )\n return responses.model_dump()\n\n system_message = (\n [{\"role\": \"system\", \"content\": system_prompt}] if system_prompt else []\n )\n user_message = [{\"role\": \"user\", \"content\": prompt}]\n\n responses = await call_api(prompt, system_prompt, config)\n\n results = np.array(\n [responses[\"choices\"][i][\"message\"][\"content\"] for i in range(config.n)]\n )\n usage = responses[\"usage\"]\n\n return results, usage[\"prompt_tokens\"], usage[\"completion_tokens\"]\n
"},{"location":"api/parsing/","title":"Parsing","text":""},{"location":"api/parsing/#outlines.fsm.parsing.PartialIndenter","title":"PartialIndenter
","text":" Bases: Indenter
An Indenter
that doesn't reset its state every time process
is called.
Source code in outlines/fsm/parsing.py
class PartialIndenter(Indenter):\n \"\"\"An `Indenter` that doesn't reset its state every time `process` is called.\"\"\"\n\n def process(self, stream):\n return self._process(stream)\n\n def _process(self, stream):\n for token in stream:\n # These were previously *after* the `yield`, but that makes the\n # state tracking unnecessarily convoluted.\n if token.type in self.OPEN_PAREN_types:\n self.paren_level += 1\n elif token.type in self.CLOSE_PAREN_types:\n self.paren_level -= 1\n if self.paren_level < 0:\n raise UnexpectedToken(token, [])\n\n if token.type == self.NL_type:\n yield from self.handle_NL(token)\n else:\n yield token\n\n # TODO: What do we want to do here?\n # while len(self.indent_level) > 1:\n # self.indent_level.pop()\n # yield Token(self.DEDENT_type, \"\")\n\n def accepts_token_type(self, token_type):\n if token_type in self.CLOSE_PAREN_types and self.paren_level - 1 < 0:\n return False\n\n # TODO:\n # if token_type == self.NL_type and self.paren_level == 0:\n # ...\n # return False\n\n return True\n\n def __copy__(self):\n res = type(self)()\n res.paren_level = self.paren_level\n res.indent_level = copy(self.indent_level)\n return res\n\n def __repr__(self):\n return f\"{type(self).__name__}(paren_level={self.paren_level!r}, indent_level={self.indent_level!r})\"\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialParserState","title":"PartialParserState
","text":" Bases: ParserState
Source code in outlines/fsm/parsing.py
class PartialParserState(ParserState):\n __slots__ = \"use_value_stack\"\n\n def __init__(\n self,\n parse_conf,\n lexer,\n state_stack=None,\n value_stack=None,\n use_value_stack=False,\n ):\n super().__init__(\n parse_conf, lexer, state_stack=state_stack, value_stack=value_stack\n )\n self.use_value_stack = use_value_stack\n\n def feed_token(self, token, is_end=False):\n if token.type == \"partial\":\n # If none of the potential terminals can transition, we need to know now\n current_state = self.state_stack[-1]\n current_lexer = get_contextual_lexer(self.lexer).lexers[current_state]\n\n # We have to feed the token and determine whether or not at least\n # one terminal is consistent with the stack; otherwise, we'll miss\n # invalid REDUCE cases.\n # TODO: We should track separate parses conditional on possible\n # token/symbol types, then we can coherently reuse the following\n # results instead of recomputing it later.\n can_transition = False\n for terminal_info in token.value.terminals_and_info:\n if terminal_info.terminal_name not in current_lexer.ignore_types:\n test_token = Token.new_borrow_pos(\n terminal_info.terminal_name, \"\", token\n )\n\n stack = copy(self.state_stack)\n try:\n self.feed_token_no_stack(test_token, is_end=is_end)\n can_transition = True\n break\n except UnexpectedToken:\n continue\n finally:\n self.state_stack = stack\n else:\n can_transition = True\n\n if not can_transition:\n expected = {\n s\n for s in self.parse_conf.states[current_state].keys()\n if s.isupper()\n }\n raise UnexpectedToken(\n token, expected, state=self, interactive_parser=None\n )\n\n elif self.use_value_stack:\n super().feed_token(token, is_end=is_end)\n else:\n self.feed_token_no_stack(token, is_end=is_end)\n\n def feed_token_no_stack(self, token, is_end=False):\n \"\"\"\n This is a copy of `ParserState.feed_token` with all the value stack\n steps removed. Since we're not exactly parsing in order to obtain a\n CST or anything similar, we can avoid the growing expense of tracking\n the parse tree.\n \"\"\"\n state_stack = self.state_stack\n states = self.parse_conf.states\n end_state = self.parse_conf.end_state\n\n while True:\n state = state_stack[-1]\n try:\n action, arg = states[state][token.type]\n except KeyError:\n expected = {s for s in states[state].keys() if s.isupper()}\n raise UnexpectedToken(\n token, expected, state=self, interactive_parser=None\n )\n\n assert arg != end_state\n\n if action is Shift:\n # shift once and return\n assert not is_end\n state_stack.append(arg)\n return\n else:\n # reduce+shift as many times as necessary\n rule = arg\n size = len(rule.expansion)\n if size:\n del state_stack[-size:]\n\n _action, new_state = states[state_stack[-1]][rule.origin.name]\n assert _action is Shift\n state_stack.append(new_state)\n\n if is_end and state_stack[-1] == end_state:\n return\n\n def feed_eof(self):\n last_token = self.lexer.state.last_token\n\n if last_token is None:\n eof_token = self.lexer._Token(\"$END\", \"\", 0, 1, 1)\n else:\n eof_token = Token.new_borrow_pos(\"$END\", \"\", last_token)\n\n new_token_is_legal = (\n last_token is None\n or last_token.type != \"partial\"\n or any(ti.is_final for ti in last_token.value.terminals_and_info)\n )\n if new_token_is_legal:\n self.feed_token(eof_token, is_end=True)\n else:\n raise UnexpectedToken(eof_token, [], state=self, interactive_parser=None)\n\n def choices(self):\n return self.parse_conf.parse_table.states[self.position]\n\n def accepts(self):\n \"\"\"\n Adapted from https://github.com/lark-parser/lark/blob/be542c2ff6d968817df019b8bf03f37b3111c08c/lark/parsers/lalr_interactive_parser.py#L95\n Returns the set of possible tokens that will advance the parser into a new valid state.\n \"\"\"\n accepts = set()\n conf_no_callbacks = copy(self.parse_conf)\n # We don't want to call callbacks here since those might have arbitrary side effects\n # and are unnecessarily slow.\n conf_no_callbacks.callbacks = {}\n for t in self.choices():\n if t.isupper(): # is terminal?\n new_state = copy(self)\n new_state.parse_conf = conf_no_callbacks\n try:\n new_state.feed_token(new_state.lexer._Token(t, \"\"))\n except UnexpectedToken:\n pass\n else:\n accepts.add(t)\n return accepts\n\n def __copy__(self):\n return type(self)(\n self.parse_conf,\n copy(self.lexer),\n copy(self.state_stack),\n deepcopy(self.value_stack),\n use_value_stack=self.use_value_stack,\n )\n\n def __repr__(self):\n return f\"{type(self).__name__}(lexer={self.lexer!r}, state_stack={self.state_stack!r})\"\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialParserState.accepts","title":"accepts()
","text":"Adapted from https://github.com/lark-parser/lark/blob/be542c2ff6d968817df019b8bf03f37b3111c08c/lark/parsers/lalr_interactive_parser.py#L95 Returns the set of possible tokens that will advance the parser into a new valid state.
Source code in outlines/fsm/parsing.py
def accepts(self):\n \"\"\"\n Adapted from https://github.com/lark-parser/lark/blob/be542c2ff6d968817df019b8bf03f37b3111c08c/lark/parsers/lalr_interactive_parser.py#L95\n Returns the set of possible tokens that will advance the parser into a new valid state.\n \"\"\"\n accepts = set()\n conf_no_callbacks = copy(self.parse_conf)\n # We don't want to call callbacks here since those might have arbitrary side effects\n # and are unnecessarily slow.\n conf_no_callbacks.callbacks = {}\n for t in self.choices():\n if t.isupper(): # is terminal?\n new_state = copy(self)\n new_state.parse_conf = conf_no_callbacks\n try:\n new_state.feed_token(new_state.lexer._Token(t, \"\"))\n except UnexpectedToken:\n pass\n else:\n accepts.add(t)\n return accepts\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialParserState.feed_token_no_stack","title":"feed_token_no_stack(token, is_end=False)
","text":"This is a copy of ParserState.feed_token
with all the value stack steps removed. Since we're not exactly parsing in order to obtain a CST or anything similar, we can avoid the growing expense of tracking the parse tree.
Source code in outlines/fsm/parsing.py
def feed_token_no_stack(self, token, is_end=False):\n \"\"\"\n This is a copy of `ParserState.feed_token` with all the value stack\n steps removed. Since we're not exactly parsing in order to obtain a\n CST or anything similar, we can avoid the growing expense of tracking\n the parse tree.\n \"\"\"\n state_stack = self.state_stack\n states = self.parse_conf.states\n end_state = self.parse_conf.end_state\n\n while True:\n state = state_stack[-1]\n try:\n action, arg = states[state][token.type]\n except KeyError:\n expected = {s for s in states[state].keys() if s.isupper()}\n raise UnexpectedToken(\n token, expected, state=self, interactive_parser=None\n )\n\n assert arg != end_state\n\n if action is Shift:\n # shift once and return\n assert not is_end\n state_stack.append(arg)\n return\n else:\n # reduce+shift as many times as necessary\n rule = arg\n size = len(rule.expansion)\n if size:\n del state_stack[-size:]\n\n _action, new_state = states[state_stack[-1]][rule.origin.name]\n assert _action is Shift\n state_stack.append(new_state)\n\n if is_end and state_stack[-1] == end_state:\n return\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialParsingFrontend","title":"PartialParsingFrontend
","text":" Bases: ParsingFrontend
Source code in outlines/fsm/parsing.py
class PartialParsingFrontend(ParsingFrontend):\n def __init__(self, lexer_conf, parser_conf, options, parser=None):\n assert parser_conf.parser_type == \"lalr\"\n\n options._plugins[\"LALR_Parser\"] = PartialLALRParser\n options._plugins[\"BasicLexer\"] = PartialBasicLexer\n options._plugins[\"ContextualLexer\"] = PartialContextualLexer\n options._plugins[\"LexerThread\"] = PartialLexerThread\n\n super().__init__(lexer_conf, parser_conf, options, parser=parser)\n\n if lexer_conf.postlex:\n self.lexer = PartialPostLexConnector(self.lexer.lexer, lexer_conf.postlex)\n\n self._termset_fsm_info = None\n self._symbols_to_states: Optional[\n Dict[str, Set[Tuple[ParseStateType, Action]]]\n ] = None\n self._reverse_shifts: Optional[\n Dict[ParseStateType, Dict[str, Set[ParseStateType]]]\n ] = None\n # self._state_transition_map: Optional[\n # Dict[Tuple[ParseStateType, str], Set[ParseStateType]]\n # ] = None\n\n def _compute_maps(\n self,\n ):\n \"\"\"Compute state transition and symbols-to-states maps.\"\"\"\n self._reverse_shifts = {}\n self._symbols_to_states = {}\n\n parse_table = self.parser.parser.parse_table\n\n for from_state, symbols_to_ops in parse_table.states.items():\n for symbol, op in symbols_to_ops.items():\n if op[0] == Shift:\n symbols_to_from_states = self._reverse_shifts.setdefault(op[1], {})\n symbols_to_from_states.setdefault(symbol, set()).add(from_state)\n self._symbols_to_states.setdefault(symbol, set()).add((from_state, op))\n\n # # TODO: This approach is very wasteful.\n # context_lexer = get_contextual_lexer(self)\n # self._state_transition_map = {}\n #\n # for from_state, transitions in parse_table.states.items():\n # for symbol, action in transitions.items():\n # # TODO: Filter non-terminals\n # if symbol not in context_lexer.root_lexer.terminals_by_name:\n # continue\n #\n # if action[0] is Shift:\n # self._state_transition_map.setdefault(\n # (from_state, symbol), set()\n # ).add(action[1])\n # continue\n #\n # antecedent_state_seqs = parse_to_terminal(self, [(from_state,)], symbol)\n #\n # for antecedent_state_seq in antecedent_state_seqs:\n # antecedent_state = antecedent_state_seq[-1]\n # self._state_transition_map.setdefault(\n # (from_state, symbol), set()\n # ).add(antecedent_state)\n\n def _compute_termset_fsm_info(self):\n \"\"\"Collect and return information about terminal symbol sets and their FSMs.\n\n Terminal symbol sets (or \"termsets\") are ordered sequences of terminal\n symbols that are used by each parser state. Associated with each is a\n collection of FSMs for each terminal and a single parse state FSM that is\n the union of each terminal's FSM.\n\n This constructs a list of tuples containing the termset, the set of\n parse states that use the termsets, parse state FSMs, and information\n mapping the components of the parse state FSMs to their terminal symbol\n FSMs.\n\n \"\"\"\n context_lexer = get_contextual_lexer(self)\n termsets_to_fsms = {}\n termsets_to_parse_states: Dict[Tuple[str, ...], Set[ParseStateType]] = {}\n for parse_state, lexer in context_lexer.lexers.items():\n scanner = lexer.scanner\n key = tuple(term.name for term in scanner.terminals)\n termsets_to_fsms[key] = (scanner.fsm, scanner.fsms_to_trans_finals)\n termsets_to_parse_states.setdefault(key, set()).add(parse_state)\n\n self._termset_fsm_info = [\n (\n termset,\n frozenset(termsets_to_parse_states[termset]),\n fsm,\n fsms_to_trans_finals,\n )\n for termset, (fsm, fsms_to_trans_finals) in termsets_to_fsms.items()\n ]\n\n @property\n def termset_fsm_info(self):\n if self._termset_fsm_info is None:\n self._compute_termset_fsm_info()\n return self._termset_fsm_info\n\n @property\n def symbols_to_states(self):\n if self._symbols_to_states is None:\n self._compute_maps()\n return self._symbols_to_states\n\n @property\n def reverse_shifts(self):\n if self._reverse_shifts is None:\n self._compute_maps()\n return self._reverse_shifts\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialScanner","title":"PartialScanner
","text":" Bases: Scanner
Source code in outlines/fsm/parsing.py
class PartialScanner(Scanner):\n @classmethod\n @lru_cache\n def construct_terminal_fsm(cls, terminal):\n # TODO: This should really be done at the lexer/parser level so that\n # the lifetime of these objects is tied to the parser itself.\n regex_str = terminal.pattern.to_regexp()\n pattern = interegular.parse_pattern(regex_str)\n fsm, _ = make_deterministic_fsm(pattern.to_fsm().reduce())\n return fsm, pattern.prefix_postfix\n\n def __init__(self, terminals, g_regex_flags, re_, use_bytes, match_whole=False):\n self.terminals = terminals\n self.g_regex_flags = g_regex_flags\n self.use_bytes = use_bytes\n self.match_whole = match_whole\n self.allowed_types = {t.name for t in self.terminals}\n self._mres = None\n\n fsms = []\n for t in self.terminals:\n fsm, prefix_postfix = self.construct_terminal_fsm(t)\n\n # TODO FIXME: We don't support this right now.\n assert prefix_postfix == (0, 0)\n\n fsms.append(fsm)\n\n self.fsm, self.fsms_to_trans_finals = fsm_union(fsms)\n\n def get_terminals_info(\n self, fsm_state_seq\n ) -> Tuple[Tuple[PartialTerminalInfo, ...], Tuple[PartialTerminalInfo, ...]]:\n \"\"\"Get the possible terminal symbols for an FSM state sequence.\"\"\"\n terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()\n final_terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()\n for i, (fsm_id, fsm_reads_more, in_final) in enumerate(\n get_sub_fsms_from_seq(fsm_state_seq, self.fsms_to_trans_finals)\n ):\n terminal_name = self.terminals[fsm_id].name\n info = PartialTerminalInfo(i, terminal_name, fsm_reads_more, in_final)\n terminals_and_info += (info,)\n if in_final:\n final_terminals_and_info += (info,)\n\n return terminals_and_info, final_terminals_and_info\n\n def match(self, text, pos, last_fsm_state_seq: Optional[Tuple[int, ...]] = None):\n \"\"\"Determine an FSM match over `text` starting at `pos` and continuing `last_fsm_state_seq`.\"\"\"\n\n start_pos = pos\n\n if last_fsm_state_seq:\n assert len(last_fsm_state_seq) > 1\n start_pos += len(last_fsm_state_seq) - 1\n start_state = last_fsm_state_seq[-1]\n else:\n start_state = self.fsm.initial\n\n text_part = text[start_pos:]\n\n text_transitions = get_token_transition_keys(\n self.fsm.fsm_info.alphabet_symbol_mapping,\n self.fsm.fsm_info.alphabet_anything_value,\n text_part,\n )\n\n state_seq = walk_fsm(\n self.fsm,\n text_transitions,\n start_state,\n full_match=self.match_whole,\n )\n\n if not state_seq:\n return None\n\n if last_fsm_state_seq:\n res = last_fsm_state_seq + tuple(state_seq)\n else:\n res = (start_state,) + tuple(state_seq)\n\n return res\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialScanner.get_terminals_info","title":"get_terminals_info(fsm_state_seq)
","text":"Get the possible terminal symbols for an FSM state sequence.
Source code in outlines/fsm/parsing.py
def get_terminals_info(\n self, fsm_state_seq\n) -> Tuple[Tuple[PartialTerminalInfo, ...], Tuple[PartialTerminalInfo, ...]]:\n \"\"\"Get the possible terminal symbols for an FSM state sequence.\"\"\"\n terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()\n final_terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()\n for i, (fsm_id, fsm_reads_more, in_final) in enumerate(\n get_sub_fsms_from_seq(fsm_state_seq, self.fsms_to_trans_finals)\n ):\n terminal_name = self.terminals[fsm_id].name\n info = PartialTerminalInfo(i, terminal_name, fsm_reads_more, in_final)\n terminals_and_info += (info,)\n if in_final:\n final_terminals_and_info += (info,)\n\n return terminals_and_info, final_terminals_and_info\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialScanner.match","title":"match(text, pos, last_fsm_state_seq=None)
","text":"Determine an FSM match over text
starting at pos
and continuing last_fsm_state_seq
.
Source code in outlines/fsm/parsing.py
def match(self, text, pos, last_fsm_state_seq: Optional[Tuple[int, ...]] = None):\n \"\"\"Determine an FSM match over `text` starting at `pos` and continuing `last_fsm_state_seq`.\"\"\"\n\n start_pos = pos\n\n if last_fsm_state_seq:\n assert len(last_fsm_state_seq) > 1\n start_pos += len(last_fsm_state_seq) - 1\n start_state = last_fsm_state_seq[-1]\n else:\n start_state = self.fsm.initial\n\n text_part = text[start_pos:]\n\n text_transitions = get_token_transition_keys(\n self.fsm.fsm_info.alphabet_symbol_mapping,\n self.fsm.fsm_info.alphabet_anything_value,\n text_part,\n )\n\n state_seq = walk_fsm(\n self.fsm,\n text_transitions,\n start_state,\n full_match=self.match_whole,\n )\n\n if not state_seq:\n return None\n\n if last_fsm_state_seq:\n res = last_fsm_state_seq + tuple(state_seq)\n else:\n res = (start_state,) + tuple(state_seq)\n\n return res\n
"},{"location":"api/parsing/#outlines.fsm.parsing.fsm_union","title":"fsm_union(fsms)
","text":"Construct an FSM representing the union of the FSMs in fsms
.
This is an updated version of interegular.fsm.FSM.union
made to return an extra map of component FSMs to the sets of state transitions that correspond to them in the new FSM.
Source code in outlines/fsm/parsing.py
def fsm_union(\n fsms: Sequence[FSM],\n) -> Tuple[FSM, Dict[int, Tuple[Set[Tuple[int, int]], Set[int], Dict[int, Set[int]]]]]:\n \"\"\"Construct an FSM representing the union of the FSMs in `fsms`.\n\n This is an updated version of `interegular.fsm.FSM.union` made to return an\n extra map of component FSMs to the sets of state transitions that\n correspond to them in the new FSM.\n\n \"\"\"\n\n alphabet, new_to_old = Alphabet.union(*[fsm.alphabet for fsm in fsms])\n\n indexed_fsms = tuple(enumerate(fsms))\n\n initial = {i: fsm.initial for (i, fsm) in indexed_fsms}\n\n # Dedicated function accepting a \"superset\" and returning the next\n # \"superset\" obtained by following this transition in the new FSM\n def follow(current_state, new_transition: int):\n next = {}\n for i, f in indexed_fsms:\n old_transition = new_to_old[i][new_transition]\n if (\n i in current_state\n and current_state[i] in f.map\n and old_transition in f.map[current_state[i]]\n ):\n next[i] = f.map[current_state[i]][old_transition]\n if not next:\n raise OblivionError\n return next\n\n states = [initial]\n finals: Set[int] = set()\n map: Dict[int, Dict[int, int]] = {}\n\n # Map component FSMs to their new state-to-state transitions, finals, and a\n # map translating component FSM states to aggregate FSM states\n fsms_to_trans_finals: Dict[\n int, Tuple[Set[Tuple[int, int]], Set[int], Dict[int, Set[int]]]\n ] = {}\n\n i = 0\n while i < len(states):\n state = states[i]\n\n # Add to the finals of the aggregate FSM whenever we hit a final in a\n # component FSM\n if any(state.get(j, -1) in fsm.finals for (j, fsm) in indexed_fsms):\n finals.add(i)\n\n # Compute the map for this state\n map[i] = {}\n for transition in alphabet.by_transition:\n try:\n next = follow(state, transition)\n except OblivionError:\n # Reached an oblivion state; don't list it\n continue\n else:\n try:\n # TODO: Seems like this could--and should--be avoided\n j = states.index(next)\n except ValueError:\n j = len(states)\n states.append(next)\n\n map[i][transition] = j\n\n for fsm_id, fsm_state in next.items():\n (\n fsm_transitions,\n fsm_finals,\n fsm_old_to_new,\n ) = fsms_to_trans_finals.setdefault(fsm_id, (set(), set(), {}))\n old_from = state[fsm_id]\n old_to = fsm_state\n fsm_old_to_new.setdefault(old_from, set()).add(i)\n fsm_old_to_new.setdefault(old_to, set()).add(j)\n fsm_transitions.add((i, j))\n if fsm_state in fsms[fsm_id].finals:\n fsm_finals.add(j)\n\n i += 1\n\n fsm = FSM(\n alphabet=alphabet,\n states=range(len(states)),\n initial=0,\n finals=finals,\n map=map,\n __no_validation__=True,\n )\n\n fsm, old_to_new_states = make_deterministic_fsm(fsm)\n _fsms_to_trans_finals = {\n fsm_id: (\n {(old_to_new_states[s1], old_to_new_states[s2]) for s1, s2 in transitions},\n {old_to_new_states[s] for s in finals},\n {\n old_state: {old_to_new_states[new_state] for new_state in new_states}\n for old_state, new_states in old_to_new.items()\n },\n )\n for fsm_id, (transitions, finals, old_to_new) in sorted(\n fsms_to_trans_finals.items(), key=lambda x: x[0]\n )\n }\n\n return (\n fsm,\n _fsms_to_trans_finals,\n )\n
"},{"location":"api/parsing/#outlines.fsm.parsing.get_sub_fsms_from_seq","title":"get_sub_fsms_from_seq(state_seq, fsms_to_trans_finals)
","text":"Get the indices of the sub-FSMs in fsm
that could have matched the state sequence state_seq
.
"},{"location":"api/parsing/#outlines.fsm.parsing.get_sub_fsms_from_seq--parameters","title":"Parameters","text":"state_seq A state sequence. fsms_to_trans_finals A map from FSM indices to tuples containing sets of their state transitions and sets of the final/accept states.
"},{"location":"api/parsing/#outlines.fsm.parsing.get_sub_fsms_from_seq--returns","title":"Returns","text":"A generator returning tuples containing each sub-FSM index (in the order they were union-ed to construct fsm
) and booleans indicating whether or not there is another valid transition from the last state in the sequence for the associated sub-FSM (i.e. if the FSM can continue accepting/matching) and whether or not the sequence ends in a final state of the sub-FSM.
Source code in outlines/fsm/parsing.py
def get_sub_fsms_from_seq(\n state_seq: Sequence[int],\n fsms_to_trans_finals: Dict[\n int, Tuple[Set[Tuple[int, int]], Set[int], Dict[int, Set[int]]]\n ],\n) -> Generator[Tuple[int, bool, bool], None, None]:\n \"\"\"Get the indices of the sub-FSMs in `fsm` that could have matched the state sequence `state_seq`.\n\n Parameters\n ----------\n state_seq\n A state sequence.\n fsms_to_trans_finals\n A map from FSM indices to tuples containing sets of their state transitions\n and sets of the final/accept states.\n\n Returns\n -------\n A generator returning tuples containing each sub-FSM index (in the order\n they were union-ed to construct `fsm`) and booleans indicating whether or\n not there is another valid transition from the last state in the sequence\n for the associated sub-FSM (i.e. if the FSM can continue\n accepting/matching) and whether or not the sequence ends in a final state\n of the sub-FSM.\n \"\"\"\n state_seq_transitions = set(zip(state_seq[:-1], state_seq[1:]))\n last_fsm_state = state_seq[-1]\n yield from (\n (\n # The sub-FMS index\n fsm_idx,\n # Is there another possible transition in this sub-FSM?\n any(last_fsm_state == from_s for (from_s, to_s) in transitions),\n # Is this sub-FSM in a final state?\n state_seq[-1] in finals,\n )\n for fsm_idx, (transitions, finals, _) in fsms_to_trans_finals.items()\n if state_seq_transitions.issubset(transitions)\n )\n
"},{"location":"api/parsing/#outlines.fsm.parsing.terminals_to_fsms","title":"terminals_to_fsms(lp)
","text":"Construct a dict
mapping terminal symbol names to their finite state machines.
Source code in outlines/fsm/parsing.py
def terminals_to_fsms(lp: PartialLark) -> Dict[str, FSM]:\n \"\"\"Construct a ``dict`` mapping terminal symbol names to their finite state machines.\"\"\"\n\n symbol_names_and_fsms = {}\n for terminal in lp.terminals:\n pattern = interegular.parse_pattern(terminal.pattern.to_regexp())\n # TODO: Use `pyparser.terminals[0].pattern.flags`?\n try:\n fsm, _ = make_deterministic_fsm(pattern.to_fsm().reduce())\n except Unsupported:\n fsm = None\n\n symbol_names_and_fsms[terminal.name] = fsm\n\n return symbol_names_and_fsms\n
"},{"location":"api/prompts/","title":"Prompts","text":""},{"location":"api/prompts/#outlines.prompts.Prompt","title":"Prompt
dataclass
","text":"Represents a prompt function.
We return a Prompt
class instead of a simple function so the template defined in prompt functions can be accessed.
Source code in outlines/prompts.py
@dataclass\nclass Prompt:\n \"\"\"Represents a prompt function.\n\n We return a `Prompt` class instead of a simple function so the\n template defined in prompt functions can be accessed.\n\n \"\"\"\n\n template: str\n signature: inspect.Signature\n\n def __post_init__(self):\n self.parameters: List[str] = list(self.signature.parameters.keys())\n self.jinja_environment = create_jinja_template(self.template)\n\n def __call__(self, *args, **kwargs) -> str:\n \"\"\"Render and return the template.\n\n Returns\n -------\n The rendered template as a Python ``str``.\n\n \"\"\"\n bound_arguments = self.signature.bind(*args, **kwargs)\n bound_arguments.apply_defaults()\n return self.jinja_environment.render(**bound_arguments.arguments)\n\n def __str__(self):\n return self.template\n
"},{"location":"api/prompts/#outlines.prompts.Prompt.__call__","title":"__call__(*args, **kwargs)
","text":"Render and return the template.
"},{"location":"api/prompts/#outlines.prompts.Prompt.__call__--returns","title":"Returns","text":"The rendered template as a Python str
.
Source code in outlines/prompts.py
def __call__(self, *args, **kwargs) -> str:\n \"\"\"Render and return the template.\n\n Returns\n -------\n The rendered template as a Python ``str``.\n\n \"\"\"\n bound_arguments = self.signature.bind(*args, **kwargs)\n bound_arguments.apply_defaults()\n return self.jinja_environment.render(**bound_arguments.arguments)\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_args","title":"get_fn_args(fn)
","text":"Returns the arguments of a function with annotations and default values if provided.
Source code in outlines/prompts.py
def get_fn_args(fn: Callable):\n \"\"\"Returns the arguments of a function with annotations and default values if provided.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `args` filter only applies to callables.\")\n\n arg_str_list = []\n signature = inspect.signature(fn)\n arg_str_list = [str(param) for param in signature.parameters.values()]\n arg_str = \", \".join(arg_str_list)\n return arg_str\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_description","title":"get_fn_description(fn)
","text":"Returns the first line of a callable's docstring.
Source code in outlines/prompts.py
def get_fn_description(fn: Callable):\n \"\"\"Returns the first line of a callable's docstring.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `description` filter only applies to callables.\")\n\n docstring = inspect.getdoc(fn)\n if docstring is None:\n description = \"\"\n else:\n description = docstring.split(\"\\n\")[0].strip()\n\n return description\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_name","title":"get_fn_name(fn)
","text":"Returns the name of a callable.
Source code in outlines/prompts.py
def get_fn_name(fn: Callable):\n \"\"\"Returns the name of a callable.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `name` filter only applies to callables.\")\n\n if not hasattr(fn, \"__name__\"):\n name = type(fn).__name__\n else:\n name = fn.__name__\n\n return name\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_signature","title":"get_fn_signature(fn)
","text":"Return the signature of a callable.
Source code in outlines/prompts.py
def get_fn_signature(fn: Callable):\n \"\"\"Return the signature of a callable.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `source` filter only applies to callables.\")\n\n source = textwrap.dedent(inspect.getsource(fn))\n re_search = re.search(re.compile(r\"\\(([^)]+)\\)\"), source)\n if re_search is None:\n signature = \"\"\n else:\n signature = re_search.group(1)\n\n return signature\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_source","title":"get_fn_source(fn)
","text":"Return the source code of a callable.
Source code in outlines/prompts.py
def get_fn_source(fn: Callable):\n \"\"\"Return the source code of a callable.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `source` filter only applies to callables.\")\n\n source = textwrap.dedent(inspect.getsource(fn))\n re_search = re.search(re.compile(r\"(\\bdef\\b.*)\", re.DOTALL), source)\n if re_search is not None:\n source = re_search.group(0)\n else:\n raise TypeError(\"Could not read the function's source code\")\n\n return source\n
"},{"location":"api/prompts/#outlines.prompts.get_schema_dict","title":"get_schema_dict(model)
","text":"Return a pretty-printed dictionary
Source code in outlines/prompts.py
@get_schema.register(dict)\ndef get_schema_dict(model: Dict):\n \"\"\"Return a pretty-printed dictionary\"\"\"\n return json.dumps(model, indent=2)\n
"},{"location":"api/prompts/#outlines.prompts.get_schema_pydantic","title":"get_schema_pydantic(model)
","text":"Return the schema of a Pydantic model.
Source code in outlines/prompts.py
@get_schema.register(type(BaseModel))\ndef get_schema_pydantic(model: Type[BaseModel]):\n \"\"\"Return the schema of a Pydantic model.\"\"\"\n if not type(model) == type(BaseModel):\n raise TypeError(\"The `schema` filter only applies to Pydantic models.\")\n\n if hasattr(model, \"model_json_schema\"):\n def_key = \"$defs\"\n raw_schema = model.model_json_schema()\n else: # pragma: no cover\n def_key = \"definitions\"\n raw_schema = model.schema()\n\n definitions = raw_schema.get(def_key, None)\n schema = parse_pydantic_schema(raw_schema, definitions)\n\n return json.dumps(schema, indent=2)\n
"},{"location":"api/prompts/#outlines.prompts.parse_pydantic_schema","title":"parse_pydantic_schema(raw_schema, definitions)
","text":"Parse the output of Basemodel.[schema|model_json_schema]()
.
This recursively follows the references to other schemas in case of nested models. Other schemas are stored under the \"definitions\" key in the schema of the top-level model.
Source code in outlines/prompts.py
def parse_pydantic_schema(raw_schema, definitions):\n \"\"\"Parse the output of `Basemodel.[schema|model_json_schema]()`.\n\n This recursively follows the references to other schemas in case\n of nested models. Other schemas are stored under the \"definitions\"\n key in the schema of the top-level model.\n\n \"\"\"\n simple_schema = {}\n for name, value in raw_schema[\"properties\"].items():\n if \"description\" in value:\n simple_schema[name] = value[\"description\"]\n elif \"$ref\" in value:\n refs = value[\"$ref\"].split(\"/\")\n simple_schema[name] = parse_pydantic_schema(\n definitions[refs[2]], definitions\n )\n else:\n simple_schema[name] = f\"<{name}>\"\n\n return simple_schema\n
"},{"location":"api/prompts/#outlines.prompts.prompt","title":"prompt(fn)
","text":"Decorate a function that contains a prompt template.
This allows to define prompts in the docstring of a function and simplify their manipulation by providing some degree of encapsulation. It uses the render
function internally to render templates.
import outlines
@outlines.prompt def build_prompt(question): ... \"I have a ${question}\" ... prompt = build_prompt(\"How are you?\")
This API can also be helpful in an \"agent\" context where parts of the prompt are set when the agent is initialized and never modified later. In this situation we can partially apply the prompt function at initialization.
import outlines import functools as ft ... @outlines.prompt ... def solve_task(name: str, objective: str, task: str): ... '''Your name is {{name}}. .. Your overall objective is to {{objective}}. ... Please solve the following task: {{task}} ... ''' ... hal = ft.partial(solve_task, \"HAL\", \"Travel to Jupiter\")
"},{"location":"api/prompts/#outlines.prompts.prompt--returns","title":"Returns","text":"A Prompt
callable class which will render the template when called.
Source code in outlines/prompts.py
def prompt(fn: Callable) -> Prompt:\n \"\"\"Decorate a function that contains a prompt template.\n\n This allows to define prompts in the docstring of a function and simplify their\n manipulation by providing some degree of encapsulation. It uses the `render`\n function internally to render templates.\n\n >>> import outlines\n >>>\n >>> @outlines.prompt\n >>> def build_prompt(question):\n ... \"I have a ${question}\"\n ...\n >>> prompt = build_prompt(\"How are you?\")\n\n This API can also be helpful in an \"agent\" context where parts of the prompt\n are set when the agent is initialized and never modified later. In this situation\n we can partially apply the prompt function at initialization.\n\n >>> import outlines\n >>> import functools as ft\n ...\n >>> @outlines.prompt\n ... def solve_task(name: str, objective: str, task: str):\n ... '''Your name is {{name}}.\n .. Your overall objective is to {{objective}}.\n ... Please solve the following task: {{task}}\n ... '''\n ...\n >>> hal = ft.partial(solve_task, \"HAL\", \"Travel to Jupiter\")\n\n Returns\n -------\n A `Prompt` callable class which will render the template when called.\n\n \"\"\"\n\n signature = inspect.signature(fn)\n\n # The docstring contains the template that will be rendered to be used\n # as a prompt to the language model.\n docstring = fn.__doc__\n if docstring is None:\n raise TypeError(\"Could not find a template in the function's docstring.\")\n\n template = cast(str, docstring)\n\n return Prompt(template, signature)\n
"},{"location":"api/prompts/#outlines.prompts.render","title":"render(template, **values)
","text":"Parse a Jinaj2 template and translate it into an Outlines graph.
This function removes extra whitespaces and linebreaks from templates to allow users to enter prompts more naturally than if they used Python's constructs directly. See the examples for a detailed explanation.
"},{"location":"api/prompts/#outlines.prompts.render--examples","title":"Examples","text":"Outlines follow Jinja2's syntax
import outlines outline = outlines.render(\"I like {{food}} and {{sport}}\", food=\"tomatoes\", sport=\"tennis\") I like tomatoes and tennis
If the first line of the template is empty, render
removes it
from outlines import render
tpl = ''' ... A new string''' tpl ... '\\nA new string' render(tpl) ... 'a new string'
Similarly, render
ignores linebreaks introduced by placing the closing quotes underneath the text:
tpl = ''' ... A new string ... ''' tpl ... '\\nA new string\\n' render(tpl) ... 'A new string'
If you want to insert a linebreak at the end of the rendered template, you will need to leave an empty line at the end of the template:
tpl = ''' ... A new string ... ... ''' tpl ... '\\nA new string\\n\\n' render(tpl) ... 'A new string\\n'
render
removes the identation in docstrings. This is particularly important when using prompt functions
tpl = ''' ... a string ... and another string''' tpl ... '\\n a string\\n and another string' render(tpl) ... 'a string\\nand another string'
The indentation of the first line is assumed to be the same as the second line's
tpl = '''a string ... and another''' tpl ... 'a string\\n and another' render(tpl) ... 'a string\\nand another'
To get a different indentation for the first and the second line, we can start the prompt on the string's second line:
tpl = ''' ... First line ... Second line''' render(tpl) ... 'First Line\\n Second Line'
"},{"location":"api/prompts/#outlines.prompts.render--parameters","title":"Parameters","text":"template A string that contains a template written with the Jinja2 syntax. **values Map from the variables in the template to their value.
"},{"location":"api/prompts/#outlines.prompts.render--returns","title":"Returns","text":"A string that contains the rendered template.
Source code in outlines/prompts.py
def render(template: str, **values: Optional[Dict[str, Any]]) -> str:\n r\"\"\"Parse a Jinaj2 template and translate it into an Outlines graph.\n\n This function removes extra whitespaces and linebreaks from templates to\n allow users to enter prompts more naturally than if they used Python's\n constructs directly. See the examples for a detailed explanation.\n\n Examples\n --------\n\n Outlines follow Jinja2's syntax\n\n >>> import outlines\n >>> outline = outlines.render(\"I like {{food}} and {{sport}}\", food=\"tomatoes\", sport=\"tennis\")\n I like tomatoes and tennis\n\n If the first line of the template is empty, `render` removes it\n\n >>> from outlines import render\n >>>\n >>> tpl = '''\n ... A new string'''\n >>> tpl\n ... '\\nA new string'\n >>> render(tpl)\n ... 'a new string'\n\n Similarly, `render` ignores linebreaks introduced by placing the closing quotes\n underneath the text:\n\n >>> tpl = '''\n ... A new string\n ... '''\n >>> tpl\n ... '\\nA new string\\n'\n >>> render(tpl)\n ... 'A new string'\n\n If you want to insert a linebreak at the end of the rendered template, you will\n need to leave an empty line at the end of the template:\n\n >>> tpl = '''\n ... A new string\n ...\n ... '''\n >>> tpl\n ... '\\nA new string\\n\\n'\n >>> render(tpl)\n ... 'A new string\\n'\n\n `render` removes the identation in docstrings. This is particularly important\n when using prompt functions\n\n >>> tpl = '''\n ... a string\n ... and another string'''\n >>> tpl\n ... '\\n a string\\n and another string'\n >>> render(tpl)\n ... 'a string\\nand another string'\n\n The indentation of the first line is assumed to be the same as the second line's\n\n >>> tpl = '''a string\n ... and another'''\n >>> tpl\n ... 'a string\\n and another'\n >>> render(tpl)\n ... 'a string\\nand another'\n\n To get a different indentation for the first and the second line, we can start the\n prompt on the string's second line:\n\n >>> tpl = '''\n ... First line\n ... Second line'''\n >>> render(tpl)\n ... 'First Line\\n Second Line'\n\n Parameters\n ----------\n template\n A string that contains a template written with the Jinja2 syntax.\n **values\n Map from the variables in the template to their value.\n\n Returns\n -------\n A string that contains the rendered template.\n\n \"\"\"\n jinja_template = create_jinja_template(template)\n return jinja_template.render(**values)\n
"},{"location":"api/regex/","title":"Regex","text":""},{"location":"api/regex/#outlines.generate.regex.regex","title":"regex(model, regex_str, sampler=multinomial())
","text":"Generate structured text in the language of a regular expression.
"},{"location":"api/regex/#outlines.generate.regex.regex--parameters","title":"Parameters","text":"model: An instance of Transformer
that represents a model from the transformers
library. regex_str: The regular expression that the output must follow. sampler: The sampling algorithm to use to generate token ids from the logits distribution.
"},{"location":"api/regex/#outlines.generate.regex.regex--returns","title":"Returns","text":"A SequenceGeneratorAdapter
instance that generates text constrained by the regular expression.
Source code in outlines/generate/regex.py
@singledispatch\ndef regex(model, regex_str: str, sampler: Sampler = multinomial()):\n \"\"\"Generate structured text in the language of a regular expression.\n\n Parameters\n ----------\n model:\n An instance of `Transformer` that represents a model from the\n `transformers` library.\n regex_str:\n The regular expression that the output must follow.\n sampler:\n The sampling algorithm to use to generate token ids from the logits\n distribution.\n\n Returns\n -------\n A `SequenceGeneratorAdapter` instance that generates text constrained by the\n regular expression.\n\n \"\"\"\n from outlines.processors import RegexLogitsProcessor\n\n logits_processor = RegexLogitsProcessor(regex_str, tokenizer=model.tokenizer)\n return SequenceGeneratorAdapter(model, logits_processor, sampler)\n
"},{"location":"api/samplers/","title":"Samplers","text":""},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler","title":"BeamSearchSampler
","text":"Beam Search sampling algorithm.
"},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler--attributes","title":"Attributes","text":"samples The number of samples taken for each input sequence. Equivalent to the number of beams.
Source code in outlines/samplers.py
class BeamSearchSampler:\n \"\"\"Beam Search sampling algorithm.\n\n Attributes\n ----------\n samples\n The number of samples taken for each input sequence. Equivalent to the\n number of beams.\n \"\"\"\n\n def __init__(self, beams: int = 1):\n self.samples = beams\n\n def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n _,\n ) -> Tuple[\"torch.DoubleTensor\", \"torch.DoubleTensor\", \"torch.DoubleTensor\"]:\n \"\"\"Call the beam search sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n logprobs = torch.nn.functional.log_softmax(next_token_logits, dim=-1)\n weights = logprobs + sequence_weights.unsqueeze(1).expand_as(next_token_logits)\n\n # Flatten scores to (n_batch, n_samples * vocab_size)\n # and find the top-k weights for each batch.\n batch_size = next_token_logits.shape[0] // self.samples\n vocab_size = next_token_logits.shape[-1]\n weights = weights.view(batch_size, self.samples * vocab_size)\n\n # If the weights are all equal to 0 we are at the beginning of the search\n # and thus only need to sample from one set of token logits for each\n # batch.\n if torch.all(sequence_weights == 0):\n weights = weights[:, :vocab_size]\n\n weights, indices = torch.topk(\n weights, self.samples, dim=1, largest=True, sorted=True\n )\n\n ancestors = torch.div(indices, vocab_size, rounding_mode=\"floor\")\n next_token_ids = indices % vocab_size\n\n # Re-shape the weights, next_token_ids and ancestors to (n_batch * n_samples, 1)\n first_batch_idx = torch.arange(\n 0, batch_size * self.samples, self.samples, device=next_token_logits.device\n ).unsqueeze(1)\n ancestors = ancestors + first_batch_idx\n\n ancestors = ancestors.view(self.samples * batch_size)\n weights = weights.view(self.samples * batch_size)\n next_token_ids = next_token_ids.view(self.samples * batch_size, 1)\n\n return next_token_ids, ancestors, weights\n\n @property\n def sampling_params(self):\n return SamplingParameters(\"beam_search\", self.samples, None, None, 1.0)\n
"},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler.__call__","title":"__call__(next_token_logits, sequence_weights, _)
","text":"Call the beam search sampler.
"},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler.__call__--parameters","title":"Parameters","text":"next_token_logits A tensor of shape (n_seqs, vocab_size,)
that represents the probability distribution of the next token over the vocabulary. sequence_weights A tensor of shape (n_seqs,)
that represents the cumulative weight of each sequence. rng A random number generator.
"},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler.__call__--returns","title":"Returns","text":"A tuple with an array that contains the ids of the sampled tokens of shape (n_seqs, 1)
, an array that contains the ancestors of each sampled id of shape (n_seqs,)
and an array that contains the updated cumulative weights of each sequence of shape (n_seqs,)
.
Source code in outlines/samplers.py
def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n _,\n) -> Tuple[\"torch.DoubleTensor\", \"torch.DoubleTensor\", \"torch.DoubleTensor\"]:\n \"\"\"Call the beam search sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n logprobs = torch.nn.functional.log_softmax(next_token_logits, dim=-1)\n weights = logprobs + sequence_weights.unsqueeze(1).expand_as(next_token_logits)\n\n # Flatten scores to (n_batch, n_samples * vocab_size)\n # and find the top-k weights for each batch.\n batch_size = next_token_logits.shape[0] // self.samples\n vocab_size = next_token_logits.shape[-1]\n weights = weights.view(batch_size, self.samples * vocab_size)\n\n # If the weights are all equal to 0 we are at the beginning of the search\n # and thus only need to sample from one set of token logits for each\n # batch.\n if torch.all(sequence_weights == 0):\n weights = weights[:, :vocab_size]\n\n weights, indices = torch.topk(\n weights, self.samples, dim=1, largest=True, sorted=True\n )\n\n ancestors = torch.div(indices, vocab_size, rounding_mode=\"floor\")\n next_token_ids = indices % vocab_size\n\n # Re-shape the weights, next_token_ids and ancestors to (n_batch * n_samples, 1)\n first_batch_idx = torch.arange(\n 0, batch_size * self.samples, self.samples, device=next_token_logits.device\n ).unsqueeze(1)\n ancestors = ancestors + first_batch_idx\n\n ancestors = ancestors.view(self.samples * batch_size)\n weights = weights.view(self.samples * batch_size)\n next_token_ids = next_token_ids.view(self.samples * batch_size, 1)\n\n return next_token_ids, ancestors, weights\n
"},{"location":"api/samplers/#outlines.samplers.GreedySampler","title":"GreedySampler
","text":"Greedy Sampling algorithm.
Greedy sampling consists in choosing the token with the largest likelihood at every step.
We don't allow more than one sample. We could attribute this a meaning, for instance the k-th sample represents the k-th most likely token. In which case it would be equivalent to beam search without the sequence weights.
"},{"location":"api/samplers/#outlines.samplers.GreedySampler--attributes","title":"Attributes","text":"samples The number of samples taken for each input sequence.
Source code in outlines/samplers.py
class GreedySampler:\n \"\"\"Greedy Sampling algorithm.\n\n Greedy sampling consists in choosing the token with the largest\n likelihood at every step.\n\n We don't allow more than one sample. We could attribute this a meaning, for\n instance the k-th sample represents the k-th most likely token. In which\n case it would be equivalent to beam search without the sequence weights.\n\n Attributes\n ----------\n samples\n The number of samples taken for each input sequence.\n\n \"\"\"\n\n def __init__(self):\n self.samples = 1\n\n def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n _,\n ) -> \"torch.DoubleTensor\":\n \"\"\"Call the greedy sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n logprobs = torch.nn.functional.log_softmax(next_token_logits, dim=-1)\n next_token_ids = torch.argmax(logprobs, dim=-1, keepdim=True)\n\n ancestors = torch.arange(\n next_token_logits.shape[0], device=next_token_logits.device\n )\n weights = sequence_weights + torch.gather(logprobs, 1, next_token_ids).squeeze()\n\n return next_token_ids, ancestors, weights\n\n @property\n def sampling_params(self):\n return SamplingParameters(\"greedy\", self.samples, None, None, 0.0)\n
"},{"location":"api/samplers/#outlines.samplers.GreedySampler.__call__","title":"__call__(next_token_logits, sequence_weights, _)
","text":"Call the greedy sampler.
"},{"location":"api/samplers/#outlines.samplers.GreedySampler.__call__--parameters","title":"Parameters","text":"next_token_logits A tensor of shape (n_seqs, vocab_size,)
that represents the probability distribution of the next token over the vocabulary. sequence_weights A tensor of shape (n_seqs,)
that represents the cumulative weight of each sequence. rng A random number generator.
"},{"location":"api/samplers/#outlines.samplers.GreedySampler.__call__--returns","title":"Returns","text":"A tuple with an array that contains the ids of the sampled tokens of shape (n_seqs, 1)
, an array that contains the ancestors of each sampled id of shape (n_seqs,)
and an array that contains the updated cumulative weights of each sequence of shape (n_seqs,)
.
Source code in outlines/samplers.py
def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n _,\n) -> \"torch.DoubleTensor\":\n \"\"\"Call the greedy sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n logprobs = torch.nn.functional.log_softmax(next_token_logits, dim=-1)\n next_token_ids = torch.argmax(logprobs, dim=-1, keepdim=True)\n\n ancestors = torch.arange(\n next_token_logits.shape[0], device=next_token_logits.device\n )\n weights = sequence_weights + torch.gather(logprobs, 1, next_token_ids).squeeze()\n\n return next_token_ids, ancestors, weights\n
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler","title":"MultinomialSampler
","text":"Multinomial sampling algorithm.
Multinomial sampling consists in randomly sampling the next token assuming its distribution is a Categorical distribution parametrized by the next-token logits.
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler--attributes","title":"Attributes","text":"samples The number of samples taken for each input sequence.
Source code in outlines/samplers.py
class MultinomialSampler:\n \"\"\"Multinomial sampling algorithm.\n\n Multinomial sampling consists in randomly sampling the next token assuming\n its distribution is a Categorical distribution parametrized by the\n next-token logits.\n\n\n Attributes\n ----------\n samples\n The number of samples taken for each input sequence.\n\n \"\"\"\n\n def __init__(\n self,\n samples: int = 1,\n *,\n top_k: Optional[int] = None,\n top_p: Optional[float] = None,\n temperature: Optional[float] = None,\n ):\n self.samples = samples\n self.top_k = top_k\n self.top_p = top_p\n self.temperature = temperature\n\n self.logits_processors = []\n if top_k is not None:\n self.logits_processors.append(keep_top_k_logits(top_k))\n elif top_p is not None:\n self.logits_processors.append(keep_top_p_logits(top_p))\n\n if temperature is not None:\n self.logits_processors.append(rescale_logits(temperature))\n\n def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n rng: \"torch.Generator\",\n ) -> Tuple[\"torch.DoubleTensor\", \"torch.DoubleTensor\", \"torch.DoubleTensor\"]:\n \"\"\"Call the multinomial sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n altered_next_token_logits = next_token_logits\n for logit_processor in self.logits_processors:\n altered_next_token_logits = logit_processor(next_token_logits)\n\n probs = torch.nn.functional.softmax(altered_next_token_logits, dim=-1)\n next_token_ids = torch.multinomial(probs, num_samples=1, generator=rng)\n\n logprobs = torch.nn.functional.log_softmax(altered_next_token_logits, dim=-1)\n ancestors = torch.arange(\n altered_next_token_logits.shape[0], device=next_token_logits.device\n )\n weights = sequence_weights + torch.gather(logprobs, 1, next_token_ids).squeeze()\n\n return next_token_ids, ancestors, weights\n\n @property\n def sampling_params(self):\n return SamplingParameters(\n \"multinomial\",\n self.samples,\n self.top_p,\n self.top_k,\n self.temperature,\n )\n
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler.__call__","title":"__call__(next_token_logits, sequence_weights, rng)
","text":"Call the multinomial sampler.
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler.__call__--parameters","title":"Parameters","text":"next_token_logits A tensor of shape (n_seqs, vocab_size,)
that represents the probability distribution of the next token over the vocabulary. sequence_weights A tensor of shape (n_seqs,)
that represents the cumulative weight of each sequence. rng A random number generator.
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler.__call__--returns","title":"Returns","text":"A tuple with an array that contains the ids of the sampled tokens of shape (n_seqs, 1)
, an array that contains the ancestors of each sampled id of shape (n_seqs,)
and an array that contains the updated cumulative weights of each sequence of shape (n_seqs,)
.
Source code in outlines/samplers.py
def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n rng: \"torch.Generator\",\n) -> Tuple[\"torch.DoubleTensor\", \"torch.DoubleTensor\", \"torch.DoubleTensor\"]:\n \"\"\"Call the multinomial sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n altered_next_token_logits = next_token_logits\n for logit_processor in self.logits_processors:\n altered_next_token_logits = logit_processor(next_token_logits)\n\n probs = torch.nn.functional.softmax(altered_next_token_logits, dim=-1)\n next_token_ids = torch.multinomial(probs, num_samples=1, generator=rng)\n\n logprobs = torch.nn.functional.log_softmax(altered_next_token_logits, dim=-1)\n ancestors = torch.arange(\n altered_next_token_logits.shape[0], device=next_token_logits.device\n )\n weights = sequence_weights + torch.gather(logprobs, 1, next_token_ids).squeeze()\n\n return next_token_ids, ancestors, weights\n
"},{"location":"api/samplers/#outlines.samplers.SamplingParameters","title":"SamplingParameters
dataclass
","text":"Sampling parameters available in Outlines.
Source code in outlines/samplers.py
@dataclass(frozen=True)\nclass SamplingParameters:\n \"\"\"Sampling parameters available in Outlines.\"\"\"\n\n sampler: str\n num_samples: int = 1\n top_p: Optional[float] = None\n top_k: Optional[int] = None\n temperature: Optional[float] = None\n
"},{"location":"api/samplers/#outlines.samplers.keep_top_k_logits","title":"keep_top_k_logits(k)
","text":"Build a function that masks logits values smaller than the top k
ones.
"},{"location":"api/samplers/#outlines.samplers.keep_top_k_logits--parameters","title":"Parameters","text":"k The ranking below which logit values are replaced by -math.inf
.
Source code in outlines/samplers.py
def keep_top_k_logits(k: int) -> Callable[[\"torch.Tensor\"], \"torch.Tensor\"]:\n \"\"\"Build a function that masks logits values smaller than the top `k` ones.\n\n Parameters\n ----------\n k\n The ranking below which logit values are replaced by `-math.inf`.\n\n \"\"\"\n import torch\n\n if not isinstance(k, int) or k < 1:\n raise ValueError(f\"`k` must be a strictly positive integers, got {k} instead.\")\n\n def logits_processor(logits: torch.Tensor) -> torch.Tensor:\n num_to_keep = min(k, logits.size(-1))\n mask_idx = logits < torch.topk(logits, num_to_keep)[0][..., -1, None]\n return logits.masked_fill(mask_idx, -math.inf)\n\n return logits_processor\n
"},{"location":"api/samplers/#outlines.samplers.keep_top_p_logits","title":"keep_top_p_logits(p)
","text":"Build a function that masks the lowest probability tokens whose cumulative probability is below a certain threshold.
"},{"location":"api/samplers/#outlines.samplers.keep_top_p_logits--parameters","title":"Parameters","text":"p The value of the threshold. We keep the highest probability tokens whose cumulative distribution is greater than or equal to p
and mask the others. Its value must be between 0 (excluded) and 1 (included).
Source code in outlines/samplers.py
def keep_top_p_logits(p: float) -> Callable[[\"torch.Tensor\"], \"torch.Tensor\"]:\n \"\"\"Build a function that masks the lowest probability tokens whose\n cumulative probability is below a certain threshold.\n\n Parameters\n ----------\n p\n The value of the threshold. We keep the highest probability tokens whose\n cumulative distribution is greater than or equal to `p` and mask the\n others. Its value must be between 0 (excluded) and 1 (included).\n\n \"\"\"\n import torch\n\n if p <= 0.0 or p > 1.0:\n raise ValueError(\n f\"`p` must be a floating point number between 0 (excluded) and 1 (included), got {p} instead.\"\n )\n\n def logits_processor(logits: torch.Tensor) -> torch.Tensor:\n sorted_logits, sorted_idx = torch.sort(logits, descending=False)\n cumulative_probabilties = torch.nn.functional.softmax(\n sorted_logits, dim=-1\n ).cumsum(dim=-1)\n\n sorted_masked_idx = cumulative_probabilties <= (1 - p)\n mask_idx = torch.scatter(sorted_masked_idx, 1, sorted_idx, sorted_masked_idx)\n return logits.masked_fill(mask_idx, -math.inf)\n\n return logits_processor\n
"},{"location":"api/samplers/#outlines.samplers.rescale_logits","title":"rescale_logits(temperature)
","text":"Build a function that rescales the token probabilities exponentially.
"},{"location":"api/samplers/#outlines.samplers.rescale_logits--parameters","title":"Parameters","text":"temperature The value by which we rescale the logits.
Source code in outlines/samplers.py
def rescale_logits(temperature: float) -> Callable[[\"torch.Tensor\"], \"torch.Tensor\"]:\n \"\"\"Build a function that rescales the token probabilities exponentially.\n\n Parameters\n ----------\n temperature\n The value by which we rescale the logits.\n\n \"\"\"\n\n if not isinstance(temperature, float) or temperature < 0.0:\n raise ValueError(\n f\"`temperature` must be a strictly positive floating point number, got {temperature} instead.\"\n )\n elif temperature == 0.0:\n raise ValueError(\n \"Please use the greedy sampler instead of setting the temperature to 0.\"\n )\n\n def logits_processor(logits: \"torch.Tensor\") -> \"torch.Tensor\":\n return logits / temperature\n\n return logits_processor\n
"},{"location":"blog/","title":"Blog","text":""},{"location":"blog/2024/01/10/roadmap-for-2024/","title":"Roadmap for 2024","text":"Outlines is not even one year old and it's already gone a long way! As we just reached 4000 stars, and before laying out the roadmap for the following year, we would like to pause and thank all of you for supporting us, using and contributing to the library!
"},{"location":"blog/2024/01/10/roadmap-for-2024/#thoughts","title":"Thoughts","text":"Before delving into the detailed roadmap, let me share a few thoughts and explain the general direction of the library. These thoughts are informed with my multiple interactions with users, either on Twitter or in our Discord server.
Outlines currently differentiates itself from other libraries with its efficient JSON- and regex- constrained generation. A user-facing interface for grammar-structured generation (it had been hidden in the repository) was also recently added. But there is much more we can do along these lines. In 2024 will we will keep pushing in the direction of more accurate, faster constrained generation.
Outlines also supports many models providers: transformers
, mamba
, llama.cpp
and exllama2
. Those integrations represent a lot of maintenance, and we will need to simplify them. For instance, transformers
now supports quantized models, and we will soon deprecate the support for autoawq
and autogptq
. Thanks to a refactor of the library, it is now possible to use our constrained generation method by using logits processor with all other libraries, except mamba
. We will look for libraries that provide state-space models and allow to pass a logits processor during inference. We will interface with llama.cpp
and exllama2
using logits processors.
We would like expand our work to the whole sampling layer, and add new sampling methods that should make structured generation more accurate. This means we will keep the transformers
integration as it is today and will expand our text generation logic around this library.
Making workflows re-usable and easy to share is difficult today. That is why we are big believers in outlines functions. We will keep improving the interface and adding examples.
Finally, we want to add a CLI tool, outlines serve
. This will allows you to either serve an API that does general constrained generation, or to serve Outlines function.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#detailed-roadmap","title":"Detailed roadmap","text":"Here is a more detailed roadmap for the next 12 months. Outlines is a community effort, and we invite you to pick either topic and contribute to the library. I will progressively add related issues in the repository.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#many-more-examples-and-tutorials","title":"Many more examples and tutorials","text":"Let's be honest, Outlines is lacking clear and thorough examples. We want to change this!
- How does Outlines work? What can you do with it?
- What can you do with Outlines that is harder or impossible to do with other libraries?
- How you can perform standard LLM workflows, for instance Chain of Thoughts, Tree of Thoughts, etc?
- How does Oultines integrates with the larger ecosystem, for instance other libraries like LangChain and LlamaIndex?
"},{"location":"blog/2024/01/10/roadmap-for-2024/#simplify-the-integrations","title":"Simplify the integrations","text":"We want to keep the current integrations but lower the maintenance cost so we can focus on what we bring to the table.
- Deprecate every obsolete integration:
transformers
has recently integrated autoawq
and autogptq
for instance. (PR) - See if we can integrate to a library that provides state-space models via a logit processing function;
- Integrate with llama.cpp via a logits processor;
- Integrate with exllamav2 via a logits processor;
"},{"location":"blog/2024/01/10/roadmap-for-2024/#push-structured-generation-further","title":"Push structured generation further","text":"We're just getting started!
- Improve the performance of existing structured generation algorithms;
- Improve the correctness of structured generation algorithms;
- Add ready-to-use grammars in the grammars repository or in a submodule in Outlines.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#keep-developing-outlines-functions","title":"Keep developing Outlines functions","text":"Functions are awesome, use them!
- Implement a CLI
outlines serve
that allows to serve Outlines functions locally; - Add more functions to the functions repository.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#serve-structured-generation","title":"Serve structured generation","text":"We want to make it easier to serve structured generation and outlines functions.
- Implement the outlines serve CLI
outlines serve
- Serve local APIs that perform structured generation;
- Serve Outlines functions.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#improve-the-generation-layer","title":"Improve the generation layer","text":" - Use
transformers
's private API to prepare inputs for generation inside the Transformers
class; - Support successions of model generation and text infilling for methods like Beam Search and SMC;
- Differentiate by adding new caching methods: attention sink, trie-based caching, etc;
- Differentiate by implementing SMC;
- Implement Beam Search;
- Add token healing.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#a-more-seamless-integration-with-openai","title":"A more seamless integration with OpenAI","text":" - Provide the same user interface for OpenAI and open source models so they are easily interchangeable;
- Integrate the function calling API.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#last-word","title":"Last word","text":"This roadmap was influenced by the expressed interests of the community. If it doesn't reflect your needs please come and share your experience with us.
"},{"location":"community/","title":"Community","text":"Outlines exists for a community of users who believe software doesn't need to be complicated. Who share the same passion for Large Language Models but don't want to compromise on robustness. Together, we are bringing these powerful models back to the world of software.
"},{"location":"community/#connect-on-discord","title":"Connect on Discord","text":"The Outlines community lives on our Discord server. There you can ask questions, share ideas or just chat with people like you. Don't be a stranger and join us.
"},{"location":"community/contribute/","title":"Contribute","text":""},{"location":"community/contribute/#what-contributions","title":"What contributions?","text":" - Documentation contributions are very valuable to us!
- Examples. Show us what you did with Outlines :)
- Bug reports with a minimum working examples in the issue tracker
- Bug fixes are always a pleasure to review.
- New features. Please start a new discussion, or come chat with us beforehand!
Note that the issue tracker is only intended for actionable items. In doubt, open a discussion or come talk to us.
"},{"location":"community/contribute/#how-to-contribute","title":"How to contribute?","text":""},{"location":"community/contribute/#setup","title":"Setup","text":"First, fork the repository on GitHub and clone the fork locally:
git clone git@github.com/YourUserName/outlines.git\ncd outlines\n
Create a new virtual environment. If you are using conda:
conda env create -f environment.yml\n
If you are using venv:
python -m venv .venv\nsource .venv/bin/activate\n
Then install the dependencies in editable mode, and install the pre-commit hooks:
pip install -e \".[test]\"\npre-commit install\n
"},{"location":"community/contribute/#before-pushing-your-code","title":"Before pushing your code","text":"Run the tests:
pytest\n
And run the code style checks:
pre-commit run --all-files\n
"},{"location":"community/contribute/#benchmarking","title":"Benchmarking","text":"Outlines uses asv for automated benchmark testing. Benchmarks are run automatically before pull requests are merged to prevent performance degredation.
You can run the benchmark test suite locally with the following command:
asv run --config benchmarks/asv.conf.json\n
Caveats: - If you're on a device with CUDA, you must add the argument --launch-method spawn
- Uncommitted code will not be benchmarked, you must first commit your changes.
"},{"location":"community/contribute/#run-a-specific-test","title":"Run a specific test:","text":"asv run --config benchmarks/asv.conf.json -b bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm\n
"},{"location":"community/contribute/#profile-a-specific-test","title":"Profile a specific test:","text":"asv run --config benchmarks/asv.conf.json --profile -b bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm\n
"},{"location":"community/contribute/#compare-to-originmain","title":"Compare to origin/main
","text":"get fetch origin\nasv continuous origin/main HEAD --config benchmarks/asv.conf.json\n
"},{"location":"community/contribute/#asv-pr-behavior","title":"ASV PR Behavior","text":" - View ASV Benchmark Results: Open the workflow, view
BENCHMARK RESULTS
section. - Merging is blocked unless benchmarks are run for the latest commit.
- Benchmarks fail if performance degrades by more than 10% for any individual benchmark.
- The \"Benchmark PR\" workflow runs when its manually dispatched, or if the
run_benchmarks
label is added to the PR they run for every commit.
"},{"location":"community/contribute/#contribute-to-the-documentation","title":"Contribute to the documentation","text":"To work on the documentation you will need to install the related dependencies:
pip install -r requirements-doc.txt\n
To build the documentation and serve it locally, run the following command in the repository's root folder:
mkdocs serve\n
By following the instruction you will be able to view the documentation locally. It will be updated every time you make a change.
"},{"location":"community/contribute/#open-a-pull-request","title":"Open a Pull Request","text":"Create a new branch on your fork, commit and push the changes:
git checkout -b new-branch\ngit add .\ngit commit -m \"Changes I made\"\ngit push origin new-branch\n
Then you can open a pull request on GitHub. It should prompt you to do so. Every subsequent change that you make on your branch will update the pull request.
Do not hesitate to open a draft PR before your contribution is ready, especially if you have questions and/or need feedback. If you need help, come tell us on Discord.
"},{"location":"community/examples/","title":"Community projects and articles","text":"Publishing examples and articles about Outlines are a meaningful way to contrinute to the community. Here is a list of projects we are aware of. Drop us a line if we forgot yours!
MMSG is a Python library for generating interleaved text and image content in a structured format you can directly pass to downstream APIs.
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report shows that Structured Generation can outperform finetuning, and maybe even multimodality, in document-image understanding tasks as part of CVPR's 2nd MMFM Challenge.
Chess LLM Arena is a HuggingFace Space where you can make LLMs compete in a chess match.
LLM Data Gen is a HuggingFace Space that generates synthetic dataset files in JSONLines format.
Fast, High-Fidelity LLM Decoding with Regex Constraints presents an efficient alternative to Outlines's structured generation.
gigax is an Open-Source library that allows to create real-time LLM-powered NPCs for video games.
Improving Prompt Consistency with Structured Generations shows how structured generation can improve consistency of evaluation runs by reducing sensitivity to changes in prompt format.
AskNews is a news curation service processing 300k news articles per day in a structured way, with Outlines.
"},{"location":"community/feedback/","title":"Feedback","text":"If Outlines has been helpful to you, let us know on Discord or give us a shoutout on Twitter! It's always heartwarming \u2764\ufe0f
I am once again reminding you that structured extraction using LLMs is going to transform every single industry in the next 10 years https://t.co/xQ3tcWnrZ8
\u2014 Sam Hogan (@0xSamHogan) April 17, 2024 outline's growth is insane, using is an understatement! https://t.co/rHCNWhZdCs
\u2014 jason liu (@jxnlco) April 17, 2024 Outlines is an amazing lib and more popular than @remilouf\u2019s modesty will admit. https://t.co/DfHbMPIlX1 https://t.co/mDHIWJrD0C
\u2014 Delip Rao e/\u03c3 (@deliprao) April 18, 2024 Impressive implementation of a true regex / json / grammar guided text generation pic.twitter.com/RX5RVYaVIx
\u2014 Rohan Paul (@rohanpaul_ai) December 30, 2023 Most underrated Github Repo in AI + LLM JSON guided Generation: https://t.co/lSB8KIet1H
\u2014 \ud83c\udf99Jean-Louis Queguiner (@JiliJeanlouis) December 18, 2023 Nice and useful. https://t.co/LX72AE0lgt
\u2014 Dan Roy (@roydanroy) August 15, 2023 HUGE dub for open source AI https://t.co/bYKuiEUZ1j
\u2014 kenneth \ud83d\udd87 (@k3nnethfrancis) August 15, 2023 This is amazing - glad to see more outp guidance modules! Will try this out soon I'm wondering how they translate from regex automatons to token boundariesAlso why Open Source will succeed. Even today I don't see any guided output functionality from the big providers. https://t.co/Ity2H25Klf
\u2014 Hrishi (@hrishioa) August 14, 2023 Outlines - a library to help LLM developers guide text generation in a fast and reliable way.\"Provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema.\"Need to check this out. Reliable JSON output is a common use\u2026 pic.twitter.com/Bkbh8vKogN
\u2014 elvis (@omarsar0) August 14, 2023 Woah this is cool! Makes open source models more usable.Give any LLM Function Call capability (and more) with Outlines: https://t.co/PtPykR5ZGR https://t.co/RRQjWHnIxv pic.twitter.com/BwNnH8SMwv
\u2014 Yohei (@yoheinakajima) August 14, 2023 This is awesome! Being able to guarantee the output's structure unblocks so many applications. This is a great milestone and a fundamental building block for more advanced AI apps. https://t.co/WdwMOc7hE8
\u2014 Guilherme Castro (@skastr052) August 15, 2023 Juggling with the unpredictable outputs of ChatGPT API lately while building my product. \ud83d\ude13 Tried prompt engineering to channel its wisdom into a neat JSON, but it's like asking a cat to fetch. \ud83d\udc31Luckily, stumbled upon \"Outlines\" \u2013 looks like a promising way to tame the LLM\u2026 pic.twitter.com/oYQ6q8exAS
\u2014 Charlie (@14435635Sun) August 15, 2023 A complex system of LLM input-outputs interacting with non-LLM agents and models benefits immeasurably from structured outputs. The outlines package saves so much time, https://t.co/NhVQ6NpKDR
\u2014 Amir Sani (@amirsani) November 26, 2023"},{"location":"community/feedback/#let-us-know","title":"Let us know!","text":"We highly value the insights of our users, and we would love to hear from you. If you are using Outlines for your projects and would like to share your experience with us, let's connect:
- What are you building with it?
- What do you like about it?
- What challenges are you facing?
- What do you think could be improved?
To schedule an appointment follow this link. This is exclusively intended to share your experience, please go on Discord or GitHub for support.
"},{"location":"community/versioning/","title":"Versioning Guide","text":"The Outlines project follows a structured versioning scheme designed to provide clarity and minimize risk for downstream dependents.
Each part of the version number (major.minor.patch
) conveys information about the nature and impact of the changes included in the release.
- Major Releases includes compatibility-breaking changes to core interfaces, such as
LogitsProcessor
s and Guides
. - Minor Releases introduce changes of substance to internal or unexposed functionality. These changes are well tested and intended to maintain compatability with existing use of core interfaces.
- Patch Releases address bug fixes and incorporate low-risk changes to improve stability and performance.
"},{"location":"community/versioning/#releases","title":"Releases","text":"Releases along with release notes can be found on the Outlines Releases GitHub Page.
"},{"location":"community/versioning/#version-pinning-recommendations","title":"Version Pinning Recommendations","text":"Here are our recommendations for managing dependencies on the Outlines package:
Small, Risk-Tolerant Projects: Pin to a specific major version.
Large, Conservative Projects: Pin to a specific minor version.
"},{"location":"cookbook/","title":"Examples","text":"This part of the documentation provides a few cookbooks that you can browse to get acquainted with the library and get some inspiration about what you could do with structured generation. Remember that you can easily change the model that is being used!
- Classification: Classify customer requests.
- Named Entity Extraction: Extract information from pizza orders.
- Dating Profile: Build dating profiles from descriptions using prompt templating and JSON-structured generation.
- Chain Of Density: Summarize documents using chain of density prompting and JSON-structured generation.
- Playing Chess: Make Phi-3 Mini play chess against itself using regex-structured generation.
- SimToM: Improve LLMs' Theory of Mind capabilities with perspective-taking prompting and JSON-structured generation.
- Q&A with Citations: Answer questions and provide citations using JSON-structured generation.
- Knowledge Graph Generation: Generate a Knowledge Graph from unstructured text using JSON-structured generation.
- Chain Of Thought (CoT): Generate a series of intermediate reasoning steps using regex-structured generation.
- ReAct Agent: Build an agent with open weights models using regex-structured generation.
- Earnings reports to CSV: Extract data from earnings reports to CSV using regex-structured generation.
- Vision-Language Models: Use Outlines with vision-language models for tasks like image captioning and visual reasoning.
- Receipt Digitization: Extract information from a picture of a receipt using structured generation.
- Structured Generation from PDFs: Use Outlines with vision-language models to read PDFs and produce structured output.
"},{"location":"cookbook/atomic_caption/","title":"Vision-Language Models with Outlines","text":"This guide demonstrates how to use Outlines with vision-language models, leveraging the new transformers_vision module. Vision-language models can process both text and images, allowing for tasks like image captioning, visual question answering, and more.
We will be using the Pixtral-12B model from Mistral to take advantage of some of its visual reasoning capabilities and a workflow to generate a multistage atomic caption.
"},{"location":"cookbook/atomic_caption/#setup","title":"Setup","text":"First, we need to install the necessary dependencies. In addition to Outlines, we'll need to install the transformers library and any specific requirements for the vision-language model we'll be using.
pip install outlines transformers torch\n
"},{"location":"cookbook/atomic_caption/#initializing-the-model","title":"Initializing the Model","text":"We'll use the transformers_vision function to initialize our vision-language model. This function is specifically designed to handle models that can process both text and image inputs. Today we'll be using the Pixtral model with the llama tokenizer. (Currently the mistral tokenizer is pending support).
import torch\nfrom transformers import (\n LlavaForConditionalGeneration,\n)\nmodel_name=\"mistral-community/pixtral-12b\" # original magnet model is able to be loaded without issue\nmodel_class=LlavaForConditionalGeneration\n\ndef get_vision_model(model_name: str, model_class: VisionModel):\n model_kwargs = {\n \"torch_dtype\": torch.bfloat16,\n \"attn_implementation\": \"flash_attention_2\",\n \"device_map\": \"auto\",\n }\n processor_kwargs = {\n \"device\": \"cuda\",\n }\n\n model = outlines.models.transformers_vision(\n model.model_name,\n model_class=model.model_class,\n model_kwargs=model_kwargs,\n processor_kwargs=processor_kwargs,\n )\n return model\nmodel = get_vision_model(model_name, model_class)\n
"},{"location":"cookbook/atomic_caption/#defining-the-schema","title":"Defining the Schema","text":"Next, we'll define a schema for the output we expect from our vision-language model. This schema will help structure the model's responses.
from pydantic import BaseModel, Field, confloat, constr\nfrom pydantic.types import StringConstraints, PositiveFloat\nfrom typing import List\nfrom typing_extensions import Annotated\n\nfrom enum import StrEnum\nclass TagType(StrEnum):\n ENTITY = \"Entity\"\n RELATIONSHIP = \"Relationship\"\n STYLE = \"Style\"\n ATTRIBUTE = \"Attribute\"\n COMPOSITION = \"Composition\"\n CONTEXTUAL = \"Contextual\"\n TECHNICAL = \"Technical\"\n SEMANTIC = \"Semantic\"\n\nclass ImageTag(BaseModel):\n tag: Annotated[\n constr(min_length=1, max_length=30),\n Field(\n description=(\n \"Descriptive keyword or phrase representing the tag.\"\n )\n )\n ]\n category: TagType\n confidence: Annotated[\n confloat(le=1.0),\n Field(\n description=(\n \"Confidence score for the tag, between 0 (exclusive) and 1 (inclusive).\"\n )\n )\n ]\n\nclass ImageData(BaseModel):\n tags_list: List[ImageTag] = Field(..., min_items=8, max_items=20)\n short_caption: Annotated[str, StringConstraints(min_length=10, max_length=150)]\n dense_caption: Annotated[str, StringConstraints(min_length=100, max_length=2048)]\n\nimage_data_generator = outlines.generate.json(model, ImageData)\n
This schema defines the structure for image tags, including categories like Entity, Relationship, Style, etc., as well as short and dense captions.
"},{"location":"cookbook/atomic_caption/#preparing-the-prompt","title":"Preparing the Prompt","text":"We'll create a prompt that instructs the model on how to analyze the image and generate the structured output:
pixtral_instruction = \"\"\"\n<s>[INST]\n<Task>You are a structured image analysis agent. Generate comprehensive tag list, caption, and dense caption for an image classification system.</Task>\n<TagCategories requirement=\"You should generate a minimum of 1 tag for each category.\" confidence=\"Confidence score for the tag, between 0 (exclusive) and 1 (inclusive).\">\n- Entity : The content of the image, including the objects, people, and other elements.\n- Relationship : The relationships between the entities in the image.\n- Style : The style of the image, including the color, lighting, and other stylistic elements.\n- Attribute : The most important attributes of the entities and relationships in the image.\n- Composition : The composition of the image, including the arrangement of elements.\n- Contextual : The contextual elements of the image, including the background, foreground, and other elements.\n- Technical : The technical elements of the image, including the camera angle, lighting, and other technical details.\n- Semantic : The semantic elements of the image, including the meaning of the image, the symbols, and other semantic details.\n<Examples note=\"These show the expected format as an abstraction.\">\n{\n \"tags_list\": [\n {\n \"tag\": \"subject 1\",\n \"category\": \"Entity\",\n \"confidence\": 0.98\n },\n {\n \"tag\": \"subject 2\",\n \"category\": \"Entity\",\n \"confidence\": 0.95\n },\n {\n \"tag\": \"subject 1 runs from subject 2\",\n \"category\": \"Relationship\",\n \"confidence\": 0.90\n },\n }\n</Examples>\n</TagCategories>\n<ShortCaption note=\"The short caption should be a concise single sentence caption of the image content with a maximum length of 100 characters.\">\n<DenseCaption note=\"The dense caption should be a descriptive but grounded narrative paragraph of the image content with high quality narrative prose. It should incorporate elements from each of the tag categories to provide a broad dense caption\">\\n[IMG][/INST]\n\"\"\".strip()\n
This prompt provides detailed instructions to the model on how to generate comprehensive tag lists, captions, and dense captions for image analysis. Because of the ordering of the instructions the original tag generation serves as a sort of visual grounding for the captioning task, reducing the amount of manual post processing required.
"},{"location":"cookbook/atomic_caption/#generating-structured-output","title":"Generating Structured Output","text":"Now we can use our model to generate structured output based on an input image:
def img_from_url(url):\n img_byte_stream = BytesIO(urlopen(url).read())\n return Image.open(img_byte_stream).convert(\"RGB\")\n\nimage_url=\"https://upload.wikimedia.org/wikipedia/commons/9/98/Aldrin_Apollo_11_original.jpg\"\nimage= img_from_url(image_url)\nresult = image_data_generator(\n pixtral_instruction,\n [image]\n)\nprint(result)\n
This code loads an image from a URL, passes it to our vision-language model along with the instruction prompt, and generates a structured output based on the defined schema. We end up with an output like this, ready to be used for the next stage in your pipeline:
{'tags_list': [{'tag': 'astronaut',\n 'category': <TagType.ENTITY: 'Entity'>,\n 'confidence': 0.99},\n {'tag': 'moon', 'category': <TagType.ENTITY: 'Entity'>, 'confidence': 0.98},\n {'tag': 'space suit',\n 'category': <TagType.ATTRIBUTE: 'Attribute'>,\n 'confidence': 0.97},\n {'tag': 'lunar module',\n 'category': <TagType.ENTITY: 'Entity'>,\n 'confidence': 0.95},\n {'tag': 'shadow of astronaut',\n 'category': <TagType.COMPOSITION: 'Composition'>,\n 'confidence': 0.95},\n {'tag': 'footprints in moon dust',\n 'category': <TagType.CONTEXTUAL: 'Contextual'>,\n 'confidence': 0.93},\n {'tag': 'low angle shot',\n 'category': <TagType.TECHNICAL: 'Technical'>,\n 'confidence': 0.92},\n {'tag': 'human first steps on the moon',\n 'category': <TagType.SEMANTIC: 'Semantic'>,\n 'confidence': 0.95}],\n 'short_caption': 'First man on the Moon',\n 'dense_caption': \"The figure clad in a pristine white space suit, emblazoned with the American flag, stands powerfully on the moon's desolate and rocky surface. The lunar module, a workhorse of space engineering, looms in the background, its metallic legs sinking slightly into the dust where footprints and tracks from the mission's journey are clearly visible. The photograph captures the astronaut from a low angle, emphasizing his imposing presence against the desolate lunar backdrop. The stark contrast between the blacks and whiteslicks of lost light and shadow adds dramatic depth to this seminal moment in human achievement.\"}\n
"},{"location":"cookbook/atomic_caption/#conclusion","title":"Conclusion","text":"The transformers_vision module in Outlines provides a powerful way to work with vision-language models. It allows for structured generation of outputs that combine image analysis with natural language processing, opening up possibilities for complex tasks like detailed image captioning, visual question answering, and more.
By leveraging the capabilities of models like Pixtral-12B and the structured output generation of Outlines, you can create sophisticated applications that understand and describe visual content in a highly structured and customizable manner.
"},{"location":"cookbook/chain_of_density/","title":"Summarize documents using Chain of Density prompting","text":"A good summary should be informative, concise and clear. While large language models are generally good at summarizing documents, their summaries tend to be long and contain redundant information; their information density tends to be on the lower end. This is where chain of Density, a new prompting technique, comes in. In this example we will show how one can implement chain of density with a few lines of code using Outlines, leveraging both Outline's prompt templating and its structured generation capabilities.
The article we will try to summarize is the first three paragraphs of the Alan Turing page on Wikipedia:
article = \"\"\"\nAlan Mathison Turing OBE FRS (/\u02c8tj\u028a\u0259r\u026a\u014b/; 23 June 1912 \u2013 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.[5] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer.[6][7][8] He is widely considered to be the father of theoretical computer science and artificial intelligence.[9]\n\nBorn in Maida Vale, London, Turing was raised in southern England. He graduated at King's College, Cambridge, with a degree in mathematics. Whilst he was a fellow at Cambridge, he published a proof demonstrating that some purely mathematical yes\u2013no questions can never be answered by computation. He defined a Turing machine and proved that the halting problem for Turing machines is undecidable. In 1938, he obtained his PhD from the Department of Mathematics at Princeton University. During the Second World War, Turing worked for the Government Code and Cypher School at Bletchley Park, Britain's codebreaking centre that produced Ultra intelligence. For a time he led Hut 8, the section that was responsible for German naval cryptanalysis. Here, he devised a number of techniques for speeding the breaking of German ciphers, including improvements to the pre-war Polish bomba method, an electromechanical machine that could find settings for the Enigma machine. Turing played a crucial role in cracking intercepted coded messages that enabled the Allies to defeat the Axis powers in many crucial engagements, including the Battle of the Atlantic.[10][11]\n\nAfter the war, Turing worked at the National Physical Laboratory, where he designed the Automatic Computing Engine, one of the first designs for a stored-program computer. In 1948, Turing joined Max Newman's Computing Machine Laboratory at the Victoria University of Manchester, where he helped develop the Manchester computers[12] and became interested in mathematical biology. He wrote a paper on the chemical basis of morphogenesis[1] and predicted oscillating chemical reactions such as the Belousov\u2013Zhabotinsky reaction, first observed in the 1960s. Despite these accomplishments, Turing was never fully recognised in Britain during his lifetime because much of his work was covered by the Official Secrets Act.[13]\n\"\"\"\n
"},{"location":"cookbook/chain_of_density/#how-chain-of-density-works","title":"How Chain Of Density works","text":"Chain Of Density starts with asking the model to generate a first long and non-specific summary. Then it asks the model to generate 4 extra summaries by proceeding in the following way:
- Identify 1-3 entities missing in the previous summary;
- Add all entities marked as missing in the previous step, while not dropping entities;
- Make the summary more concise;
The prompt also asks the model to return a list of JSON objects that contain the missing entities and the new summary. This is where structured generation will come in handy :) The paper provides the prompt and an example:
We can now implement the prompt provided in the paper:
import outlines\n\n@outlines.prompt\ndef chain_of_density(article):\n \"\"\"Article: {{ article }}\n\n You will generate increasingly concise, entity-dense summaries of the above Article.\n\n Repeat the following 2 steps 5 times.\n\n Step 1. Identify 1-3 informative Entities (\"; \" delimited) from the Article which are missing from the previously generated summary.\n Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities.\n\n A Missing Entity is:\n - Relevant: to the main story.\n - Specific: descriptive yet concise (5 words or fewer).\n - Novel: not in the previous summary.\n - Faithful: present in the Article.\n - Anywhere: located anywhere in the Article.\n\n Guidelines:\n - The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., \"this article discusses\") to reach ~80 words.\n - Make every word count: rewrite the previous summary to improve flow and make space for additional entities.\n - Make space with fusion, compression, and removal of uninformative phrases like \"the article discusses\".\n - The summaries should become highly dense and concise yet self-contained, e.g., easily understood without the Article.\n - Missing entities can appear anywhere in the new summary.\n - Never drop entities from the previous summary. If space cannot be made, add fewer new entities.\n\n Remember, use the exact same number of words for each summary.\n\n Answer in JSON. The JSON should be a a dictionary with key \"summaries\" that contains a list (length 5) of dictionaries whose keys are \"Missing_Entities\" and \"Denser_Summary\".\n \"\"\"\n
Note Note that we modified the prompt slightly so it returns a JSON object that contains the summaries, instead of a list of summaries.
"},{"location":"cookbook/chain_of_density/#outlines-implementation","title":"Outlines implementation","text":"We will use Outline's JSON-structured generation to ensure that the model's output is consistent with the format specified in the prompt. We start with defining the JSON objects that the model is asked to return using Pydantic. One JSON object that contains a list of Summary
objects that contain the missing entities and new summary:
from pydantic import BaseModel, conlist\n\nclass Summary(BaseModel):\n missing_entities: str\n denser_summary: str\n\nclass Summaries(BaseModel):\n summaries: conlist(Summary, max_length=5, min_length=5)\n
We now generate the prompt by passing the article we want to summarize to the template. We load a quantized version of Mistral-7B using the AutoAWQ library, and then use JSON-structured generation to generate the summaries:
model = outlines.models.transformers(\"TheBloke/Mistral-7B-OpenOrca-AWQ\")\n\nprompt = chain_of_density(article)\nresult = outlines.generate.json(model, Summaries)(prompt)\n
We can now check the results:
print(result.model_dump())\n# {'summaries': [\n# {\n# 'missing_entities': 'English mathematician, cryptanalyst, philosopher',\n# 'denser_summary': 'Alan Mathison Turing was an English mathematician, cryptanalyst, philosopher.'\n# },\n# {\n# 'missing_entities': '',\n# 'denser_summary': \"Alan Mathison Turing was an English mathematician who was a crucial figure in WW2's Bletchley Park codebreaking centre and designed one of the first computers.\"\n# },\n# {\n# 'missing_entities': 'cryptanalyst, studied, biology, father',\n# 'denser_summary': 'Alan Mathison Turing was an English cryptanalyst, studied theoretical computer science, and contributed to mathematical biology.'\n# },\n# {\n# 'missing_entities': 'biology, morphogenesis, chemical',\n# 'denser_summary': 'Alan Mathison Turing was an English cryptanalyst, studied theoretical computer science, and predicted chemical reactions in morphogenesis.\n# '},\n# {\n# 'missing_entities': '',\n# 'denser_summary': 'Alan Mathison Turing was an English cryptanalyst, developed computer science, and made strides in mathematical biology research.'\n# }\n# ]}\n
Not bad, considering we used a smallish model to generate the summary! Chain of Density seems to be a very effective prompting technique to generate dense summaries, even with small quantized models. Its implementation in Outlines is also very short.
Note that this is the first article I tried and it worked out of the box. Try it out on other articles, and please share the results on Twitter, or by opening a new discussion on the Outlines repository!
"},{"location":"cookbook/chain_of_thought/","title":"Chain of thought","text":"Chain of thought is a prompting technique introduced in the paper \"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models\" where throught prompting the authors generate a series of intermediate reasoning steps which improves the ability of LLMs to perform complex reasoning.
In this guide, we use outlines to apply chain of thought through structured output.
We use llama.cpp using the llama-cpp-python library. Outlines supports llama-cpp-python, but we need to install it ourselves:
pip install llama-cpp-python\n
We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
import llama_cpp\nfrom outlines import generate, models\n\nmodel = models.llamacpp(\"NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF\",\n \"Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False)\n
(Optional) Store the model weights in a custom folder By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model Hermes-2-Pro-Llama-3-8B by NousResearch from HuggingFace:
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\n
We initialize the model:
import llama_cpp\nfrom llama_cpp import Llama\nfrom outlines import generate, models\n\nllm = Llama(\n \"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False\n)\n
"},{"location":"cookbook/chain_of_thought/#chain-of-thought_1","title":"Chain of thought","text":"We first define our Pydantic class for a reasoning step:
from pydantic import BaseModel, Field\n\nclass Reasoning_Step(BaseModel):\n reasoning_step: str = Field(..., description=\"Reasoning step\")\n
We then define the Pydantic class for reasoning which will consist on a list of reasoning steps and a conclusion, and we get its JSON schema:
from typing import List\n\nclass Reasoning(BaseModel):\n reasoning: List[Reasoning_Step] = Field(..., description=\"List of reasoning steps\")\n conclusion: str = Field(..., description=\"Conclusion\")\n\njson_schema = Reasoning.model_json_schema()\n
We could generate a response using the json schema but for a change we will use the regex:
from outlines.integrations.utils import convert_json_schema_to_str\nfrom outlines.fsm.json_schema import build_regex_from_schema\n\nschema_str = convert_json_schema_to_str(json_schema=json_schema)\nregex_str = build_regex_from_schema(schema_str)\n
We then need to adapt our prompt to the Hermes prompt format for JSON schema:
def generate_hermes_prompt(user_prompt):\n return (\n \"<|im_start|>system\\n\"\n \"You are a world class AI model who answers questions in JSON \"\n f\"Here's the json schema you must adhere to:\\n<schema>\\n{json_schema}\\n</schema><|im_end|>\\n\"\n \"<|im_start|>user\\n\"\n + user_prompt\n + \"<|im_end|>\"\n + \"\\n<|im_start|>assistant\\n\"\n \"<schema>\"\n )\n
For a given user prompt:
user_prompt = \"9.11 and 9.9 -- which is bigger?\"\n
we can use generate.regex
by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt:
generator = generate.regex(model, regex_str)\nprompt = generate_hermes_prompt(user_prompt)\nresponse = generator(prompt, max_tokens=1024, temperature=0, seed=42)\n
We obtain a series of intermediate reasoning steps as well as the conclusion:
import json\n\njson_response = json.loads(response)\n\nprint(json_response[\"reasoning\"])\nprint(json_response[\"conclusion\"])\n# [{'reasoning_step': 'Both 9.11 and 9.9 are decimal numbers.'},\n# {'reasoning_step': 'When comparing decimal numbers, we look at the numbers after the decimal point.'},\n# {'reasoning_step': 'In this case, 9.11 has the number 1 after the decimal point, while 9.9 has the number 9.'},\n# {'reasoning_step': 'Since 1 is greater than 9, 9.11 is greater than 9.9.'}]\n# '9.11 is bigger.'\n
We notice that the 4th reasoning step is wrong ``Since 1 is greater than 9, 9.11 is greater than 9.9.'', so we should probably give the model some examples for this particular task.
This example was originally contributed by Alonso Silva.
"},{"location":"cookbook/classification/","title":"Classification","text":"Classification is a classic problem in NLP and finds many applications: spam detection, sentiment analysis, triaging of incoming requests, etc. We will use the example of a company that wants to sort support requests between those that require immediate attention (URGENT
), those that can wait a little (STANDARD
). You could easily extend the example by adding new labels.
This tutorial shows how one can implement multi-label classification using Outlines. We will use two functionalities of the library: generate.choice
and generate.json
.
As always, we start with initializing the model. Since we are GPU poor we will be using a quantized version of Mistal-7B-v0.1:
import outlines\n\nmodel = outlines.models.transformers(\"TheBloke/Mistral-7B-OpenOrca-AWQ\", device=\"cuda\")\n
We will use the following prompt template:
@outlines.prompt\ndef customer_support(request):\n \"\"\"You are an experienced customer success manager.\n\n Given a request from a client, you need to determine when the\n request is urgent using the label \"URGENT\" or when it can wait\n a little with the label \"STANDARD\".\n\n # Examples\n\n Request: \"How are you?\"\n Label: STANDARD\n\n Request: \"I need this fixed immediately!\"\n Label: URGENT\n\n # TASK\n\n Request: {{ request }}\n Label: \"\"\"\n
"},{"location":"cookbook/classification/#choosing-between-multiple-choices","title":"Choosing between multiple choices","text":"Outlines provides a shortcut to do multi-label classification, using the outlines.generate.choice
function to initialize a generator. Outlines uses multinomial sampling by default, here we will use the greedy sampler to get the label with the highest probability:
from outlines.samplers import greedy\n\ngenerator = outlines.generate.choice(model, [\"URGENT\", \"STANDARD\"], sampler=greedy())\n
Outlines supports batched requests, so we will pass two requests to the model: requests = [\n \"My hair is one fire! Please help me!!!\",\n \"Just wanted to say hi\"\n]\n\nprompts = [customer_support(request) for request in requests]\n
We can now asks the model to classify the requests:
labels = generator(prompts)\nprint(labels)\n# ['URGENT', 'STANDARD']\n
Now, you might be in a hurry and don't want to wait until the model finishes completion. After all, you only need to see the first letter of the response to know whether the request is urgent or standard. You can instead stream the response:
tokens = generator.stream(prompts)\nlabels = [\"URGENT\" if \"U\" in token else \"STANDARD\" for token in next(tokens)]\nprint(labels)\n# ['URGENT', 'STANDARD']\n
"},{"location":"cookbook/classification/#using-json-structured-generation","title":"Using JSON-structured generation","text":"Another (convoluted) way to do multi-label classification is to JSON-structured generation in Outlines. We first need to define our Pydantic schema that contains the labels:
from enum import Enum\nfrom pydantic import BaseModel\n\n\nclass Label(str, Enum):\n urgent = \"URGENT\"\n standard = \"STANDARD\"\n\n\nclass Classification(BaseModel):\n label: Label\n
and we can use generate.json
by passing this Pydantic model we just defined, and call the generator:
generator = outlines.generate.json(model, Classification, sampler=greedy())\nlabels = generator(prompts)\nprint(labels)\n# [Classification(label=<Label.urgent: 'URGENT'>), Classification(label=<Label.standard: 'STANDARD'>)]\n
"},{"location":"cookbook/dating_profiles/","title":"Generate a synthetic dating profile from a description","text":"In this example we will see how we can use Outlines to generate synthetic data for a dating application. This example was originally contributed by Vibhor Kumar.
from dataclasses import dataclass\nfrom enum import Enum\n\nimport torch\nimport transformers\nfrom pydantic import BaseModel, conlist, constr\n\nimport outlines\n
"},{"location":"cookbook/dating_profiles/#defining-the-profile-with-pydantic","title":"Defining the profile with Pydantic","text":"Here a dating profile will consist in a biography, a job, a list of interests and two question-answer pairs. The questions are written in advance by the team, and the users are asked to provide an answer:
class QuestionChoice(str, Enum):\n A = \"The key to my heart is\"\n B = \"The first item on my bucket list is\"\n C = \"Perks of dating me\"\n D = \"Message me if you also love\"\n E = \"People would describe me as\"\n F = \"I can beat you in a game of\"\n\n@dataclass\nclass QuestionAnswer:\n question: QuestionChoice\n answer: str\n
Users need to provide a short biography, with a minimum of 10 and a maximum of 300 characters. The application also limits job descriptions to 50 characters. In addition to the question-answer pairs, the user is required to provide a list of between 1 and 5 interests:
class DatingProfile(BaseModel):\n bio: constr(str, min_length=10, max_length=300)\n job: constr(str, max_lengt=50)\n interests: conlist(str, min_length=1, max_length=5) # type: ignore\n qna1: QuestionAnswer\n qna2: QuestionAnswer\n
"},{"location":"cookbook/dating_profiles/#prompt-template-and-examples","title":"Prompt template and examples","text":"We will ask the model to generate profiles from a high-level description:
@dataclass\nclass Example:\n description: str\n profile: DatingProfile\n
We will use Outlines' prompt templating abilities to generate the prompt for us. This help clearly separate the general prompting logic from what is specific to an example.
@outlines.prompt\ndef dating_profile_prompt(description: str, examples: list[Example]):\n \"\"\"\n You are a world-renowned matchmaker who understands the modern dating\n market. Your job is to generate dating app profiles for male clients\n interested in women based on a provided description. The profiles should be\n authentic, show off their strengths, and maximize their likelihood of\n getting matches on dating apps. Here are some examples of past clients that\n you have successfully created profiles for:\n\n {% for example in examples %}\n Description:\n {{ example.description }}\n Profile:\n {{ example.profile }}\n {% endfor %}\n\n Here is the new client who you need to create a profile for:\n Description: {{ description }}\n Profile:\n \"\"\"\n
We will provide the model with several few-shot examples:
samples: list[Example] = [\n Example(\n description=\"I'm an author and former professional soccer player living in Seattle who publishes popular fiction books. A typical day for me starts by hanging out with my cat, drinking a coffee, and reading as much as I can in a few hours. Then, I'll prepare a quick smoothie before starting to write for a few hours, take a break with soccer or running a few miles, and finally meet friends for dinner at a new, hip restaurant in the evening. Sometimes we go axe-throwing afterwards, or play poker, or watch a comedy show, or visit a dive bar. On my vacations, I travel extensively to countries South America, Europe, and Asia, with the goal of visiting them all!\",\n profile=DatingProfile(\n bio=\"Adventurer, dreamer, author, and soccer enthusiast. Life\u2019s too short to waste time so I make the most of each day by exploring new places and playing with my friends on the pitch. What\u2019s your favorite way to get out and have fun?\",\n job=\"Famous Soccer Player -> Famous Author\",\n interests=[\"Soccer\", \"Travel\", \"Friends\", \"Books\", \"Fluffy Animals\"],\n qna1=QuestionAnswer(\n question=QuestionChoice.B, answer=\"swim in all seven oceans!\"\n ),\n qna2=QuestionAnswer(\n question=QuestionChoice.E,\n answer=\"fun-loving, adventurous, and a little bit crazy\",\n ),\n ),\n ),\n Example(\n description=\"I run my company and build houses for a living. I'm a big fan of the outdoors and love to go hiking, camping, and fishing. I don't like video games, but do like to watch movies. My love language is home-cooked food, and I'm looking for someone who isn't afraid to get their hands dirty.\",\n profile=DatingProfile(\n bio=\"If you're looking for a Montana man who loves to get outdoors and hunt, and who's in-tune with his masculinity then I'm your guy!\",\n job=\"House Construction Manager / Entrepreneur\",\n interests=[\"Hunting\", \"Hiking\", \"The outdoors\", \"Home-cooked food\"],\n qna1=QuestionAnswer(question=QuestionChoice.A, answer=\"food made at home\"),\n qna2=QuestionAnswer(\n question=QuestionChoice.C,\n answer=\"having a man in your life who can fix anything\",\n ),\n ),\n ),\n Example(\n description=\"I run my own Youtube channel with 10M subscribers. I love working with kids, and my audience skews pretty young too. In my free time, I play Fortnite and Roblox. I'm looking for someone who is also a gamer and likes to have fun. I'm learning Japanese in my free time as well as how to cook.\",\n profile=DatingProfile(\n bio=\"Easy on the eyes (find me on Youtube!) and great with kids. What more do you need?\",\n job=\"Youtuber 10M+ subscribers\",\n interests=[\"Kids\", \"Gaming\", \"Japanese\"],\n qna1=QuestionAnswer(question=QuestionChoice.D, answer=\"anime and gaming!\"),\n qna2=QuestionAnswer(question=QuestionChoice.F, answer=\"Fortnite, gg ez\"),\n ),\n ),\n]\n
"},{"location":"cookbook/dating_profiles/#load-the-model","title":"Load the model","text":"We will use Mosaic's MPT-7B model (requires 13GB of GPU memory) which can fit on a single GPU with a reasonable context window. We initialize it with Outlines:
config = transformers.AutoConfig.from_pretrained(\n \"mosaicml/mpt-7b-8k-instruct\", trust_remote_code=True\n)\nconfig.init_device = \"meta\"\nmodel = outlines.models.transformers(\n model_name=\"mosaicml/mpt-7b-8k-instruct\",\n device=\"cuda\",\n model_kwargs={\n \"config\": config,\n \"trust_remote_code\": True,\n \"torch_dtype\": torch.bfloat16,\n \"device_map\": {\"\": 0},\n },\n)\n
"},{"location":"cookbook/dating_profiles/#json-structured-generation-of-profiles","title":"JSON-structured generation of profiles","text":"We will now generate a dating profile from a textual description of oneself:
new_description = \"\"\"I'm a laid-back lawyer who spends a lot of his free-time\ngaming. I work in a corporate office, but ended up here after the start-up I\ncofounded got acquired, so still play ping pong with my cool coworkers every\nday. I have a bar at home where I make cocktails, which is great for\nentertaining friends. I secretly like to wear suits and get a new one tailored\nevery few months. I also like weddings because I get to wear those suits, and\nit's a good excuse for a date. I watch the latest series because I'm paying,\nwith my hard-earned money, for every streaming service.\"\"\"\n\nprompt = dating_profile_prompt(new_description, samples)\nprofile = outlines.generate.json(model, DatingProfile)(prompt)\nparsed_profile = DatingProfile.model_validate_json(profile)\n
"},{"location":"cookbook/dating_profiles/#results","title":"Results","text":"Here are a couple of results:
{\n \"bio\": \"\"\"I'm an ambitious lawyer with a casual and fashionable style. I love\n games and sports, but my true passion is preparing refreshing cocktails at\n home and dressing to the nines at weddings. I'm currently looking for a woman\n to show a good time to and get a kiss on the opulent suit I just had made.\n Send resume to this inbox.\"\"\",\n \"job\": \"Lawyer\",\n \"interests\":\n [\n \"Stylish guys\",\n \"Gaming\",\n \"Ping pong\",\n \"Cocktails\",\n \"Weddings\"\n ],\n \"qna1\":\n {\n \"question\": \"The first item on my bucket list is\",\n \"answer\": \"be married and have a family.\"\n },\n \"qna2\":\n {\n \"question\": \"People would describe me as\",\n \"answer\": \"charming, stylish, and funny.\"\n }\n}\n
{\n \"bio\": \"\"\"I\u2019m a sexy lawyer with time on my hands. I love to game and\n play ping pong, but the real reason you should swipe to the right\n is because I look great in a suit. Who doesn\u2019t love a man in a\n suit? Just saying. Send me a message if you think it\u2019s time to take\n your dating life to the next level.\"\"\",\n \"job\": \"Lawyer\",\n \"interests\":\n [\n \"Gaming\",\n \"Ping Pong\",\n \"Tailored Suits\",\n \"Weddings\",\n \"Streaming Services\"\n ],\n \"qna1\":\n {\n \"question\": \"The first item on my bucket list is\",\n \"answer\": \"simulate space but stay alive for as long as possible\"\n },\n \"qna2\":\n {\n \"question\": \"People would describe me as\",\n \"answer\": \"easy-going, a little nerdy but with a mature essence\"\n }\n}\n
"},{"location":"cookbook/deploy-using-bentoml/","title":"Run Outlines using BentoML","text":"BentoML is an open-source model serving library for building performant and scalable AI applications with Python. It comes with tools that you need for serving optimization, model packaging, and production deployment.
In this guide, we will show you how to use BentoML to run programs written with Outlines on GPU locally and in BentoCloud, an AI Inference Platform for enterprise AI teams. The example source code in this guide is also available in the examples/bentoml/ directory.
"},{"location":"cookbook/deploy-using-bentoml/#import-a-model","title":"Import a model","text":"First we need to download an LLM (Mistral-7B-v0.1 in this example and you can use any other LLM) and import the model into BentoML's Model Store. Let's install BentoML and other dependencies from PyPi (preferably in a virtual environment):
pip install -r requirements.txt\n
Then save the code snippet below as import_model.py
and run python import_model.py
.
Note: You need to accept related conditions on Hugging Face first to gain access to Mistral-7B-v0.1.
import bentoml\n\nMODEL_ID = \"mistralai/Mistral-7B-v0.1\"\nBENTO_MODEL_TAG = MODEL_ID.lower().replace(\"/\", \"--\")\n\ndef import_model(model_id, bento_model_tag):\n\n import torch\n from transformers import AutoModelForCausalLM, AutoTokenizer\n\n tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)\n model = AutoModelForCausalLM.from_pretrained(\n MODEL_ID,\n torch_dtype=torch.float16,\n low_cpu_mem_usage=True,\n )\n\n with bentoml.models.create(bento_model_tag) as bento_model_ref:\n tokenizer.save_pretrained(bento_model_ref.path)\n model.save_pretrained(bento_model_ref.path)\n\n\nif __name__ == \"__main__\":\n import_model(MODEL_ID, BENTO_MODEL_TAG)\n
You can verify the download is successful by running:
$ bentoml models list\n\nTag Module Size Creation Time\nmistralai--mistral-7b-v0.1:m7lmf5ac2cmubnnz 13.49 GiB 2024-04-25 06:52:39\n
"},{"location":"cookbook/deploy-using-bentoml/#define-a-bentoml-service","title":"Define a BentoML Service","text":"As the model is ready, we can define a BentoML Service to wrap the capabilities of the model.
We will run the JSON-structured generation example in the README, with the following schema:
DEFAULT_SCHEMA = \"\"\"{\n \"title\": \"Character\",\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\n \"title\": \"Name\",\n \"maxLength\": 10,\n \"type\": \"string\"\n },\n \"age\": {\n \"title\": \"Age\",\n \"type\": \"integer\"\n },\n \"armor\": {\"$ref\": \"#/definitions/Armor\"},\n \"weapon\": {\"$ref\": \"#/definitions/Weapon\"},\n \"strength\": {\n \"title\": \"Strength\",\n \"type\": \"integer\"\n }\n },\n \"required\": [\"name\", \"age\", \"armor\", \"weapon\", \"strength\"],\n \"definitions\": {\n \"Armor\": {\n \"title\": \"Armor\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"leather\", \"chainmail\", \"plate\"],\n \"type\": \"string\"\n },\n \"Weapon\": {\n \"title\": \"Weapon\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"sword\", \"axe\", \"mace\", \"spear\", \"bow\", \"crossbow\"],\n \"type\": \"string\"\n }\n }\n}\"\"\"\n
First, we need to define a BentoML service by decorating an ordinary class (Outlines
here) with @bentoml.service
decorator. We pass to this decorator some configuration and GPU on which we want this service to run in BentoCloud (here an L4 with 24GB memory):
import typing as t\nimport bentoml\n\nfrom import_model import BENTO_MODEL_TAG\n\n@bentoml.service(\n traffic={\n \"timeout\": 300,\n },\n resources={\n \"gpu\": 1,\n \"gpu_type\": \"nvidia-l4\",\n },\n)\nclass Outlines:\n\n bento_model_ref = bentoml.models.get(BENTO_MODEL_TAG)\n\n def __init__(self) -> None:\n\n import outlines\n import torch\n self.model = outlines.models.transformers(\n self.bento_model_ref.path,\n device=\"cuda\",\n model_kwargs={\"torch_dtype\": torch.float16},\n )\n\n ...\n
We then need to define an HTTP endpoint using @bentoml.api
to decorate the method generate
of Outlines
class:
...\n\n @bentoml.api\n async def generate(\n self,\n prompt: str = \"Give me a character description.\",\n json_schema: t.Optional[str] = DEFAULT_SCHEMA,\n ) -> t.Dict[str, t.Any]:\n\n import outlines\n\n generator = outlines.generate.json(self.model, json_schema)\n character = generator(prompt)\n\n return character\n
Here @bentoml.api
decorator defines generate
as an HTTP endpoint that accepts a JSON request body with two fields: prompt
and json_schema
(optional, which allows HTTP clients to provide their own JSON schema). The type hints in the function signature will be used to validate incoming JSON requests. You can define as many HTTP endpoints as you want by using @bentoml.api
to decorate other methods of Outlines
class.
Now you can save the above code to service.py
(or use this implementation), and run the code using the BentoML CLI.
"},{"location":"cookbook/deploy-using-bentoml/#run-locally-for-testing-and-debugging","title":"Run locally for testing and debugging","text":"Then you can run a server locally by:
bentoml serve .\n
The server is now active at http://localhost:3000. You can interact with it using the Swagger UI or in other different ways:
CURL curl -X 'POST' \\\n 'http://localhost:3000/generate' \\\n -H 'accept: application/json' \\\n -H 'Content-Type: application/json' \\\n -d '{\n \"prompt\": \"Give me a character description.\"\n}'\n
Python client import bentoml\n\nwith bentoml.SyncHTTPClient(\"http://localhost:3000\") as client:\n response = client.generate(\n prompt=\"Give me a character description\"\n )\n print(response)\n
Expected output:
{\n \"name\": \"Aura\",\n \"age\": 15,\n \"armor\": \"plate\",\n \"weapon\": \"sword\",\n \"strength\": 20\n}\n
"},{"location":"cookbook/deploy-using-bentoml/#deploy-to-bentocloud","title":"Deploy to BentoCloud","text":"After the Service is ready, you can deploy it to BentoCloud for better management and scalability. Sign up if you haven't got a BentoCloud account.
Make sure you have logged in to BentoCloud, then run the following command to deploy it.
bentoml deploy .\n
Once the application is up and running on BentoCloud, you can access it via the exposed URL.
Note: For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.
"},{"location":"cookbook/deploy-using-cerebrium/","title":"Run Outlines using Cerebrium","text":"Cerebrium is a serverless AI infrastructure platform that makes it easier for companies to build and deploy AI based applications. They offer Serverless GPU's\u00a0with low cold start times with over 12 varieties of GPU chips that auto scale and you only pay for the compute you use.
In this guide we will show you how you can use Cerebrium to run programs written with Outlines on GPUs in the cloud.
"},{"location":"cookbook/deploy-using-cerebrium/#setup-cerebrium","title":"Setup Cerebrium","text":"First, we install Cerebrium and login to get authenticated.
pip install cerebrium\ncerebrium login\n
Then let us create our first project
cerebrium init outlines-project\n
"},{"location":"cookbook/deploy-using-cerebrium/#setup-environment-and-hardware","title":"Setup Environment and Hardware","text":"You set up your environment and hardware in the cerebrium.toml file that was created using the init function above.
[cerebrium.deployment]\ndocker_base_image_url = \"nvidia/cuda:12.1.1-runtime-ubuntu22.04\"\n\n[cerebrium.hardware]\ncpu = 2\nmemory = 14.0\ngpu = \"AMPERE A10\"\ngpu_count = 1\nprovider = \"aws\"\nregion = \"us-east-1\"\n\n[cerebrium.dependencies.pip]\noutline = \"==0.0.37\"\ntransformers = \"==4.38.2\"\ndatasets = \"==2.18.0\"\naccelerate = \"==0.27.2\"\n
"},{"location":"cookbook/deploy-using-cerebrium/#setup-inference","title":"Setup inference","text":"Running code in Cerebrium is like writing normal python with no special syntax. In a main.py
file specify the following:
import outlines\n\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n\nschema = \"\"\"{\n \"title\": \"Character\",\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\n \"title\": \"Name\",\n \"maxLength\": 10,\n \"type\": \"string\"\n },\n \"age\": {\n \"title\": \"Age\",\n \"type\": \"integer\"\n },\n \"armor\": {\"$ref\": \"#/definitions/Armor\"},\n \"weapon\": {\"$ref\": \"#/definitions/Weapon\"},\n \"strength\": {\n \"title\": \"Strength\",\n \"type\": \"integer\"\n }\n },\n \"required\": [\"name\", \"age\", \"armor\", \"weapon\", \"strength\"],\n \"definitions\": {\n \"Armor\": {\n \"title\": \"Armor\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"leather\", \"chainmail\", \"plate\"],\n \"type\": \"string\"\n },\n \"Weapon\": {\n \"title\": \"Weapon\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"sword\", \"axe\", \"mace\", \"spear\", \"bow\", \"crossbow\"],\n \"type\": \"string\"\n }\n }\n}\"\"\"\n\ngenerator = outlines.generate.json(model, schema)\n
On first deploy, it will download the model and store it on disk therefore for subsequent calls it will load the model from disk.
Every function in Cerebrium is callable through an API endpoint. Code at the top most layer (ie: not in a function) is instantiated only when the container is spun up the first time so for subsequent calls, it will simply run the code defined in the function you call.
To deploy an API that creates a new character when called with a prompt you can add the following code to main.py
:
def generate(\n prompt: str = \"Amiri, a 53 year old warrior woman with a sword and leather armor.\",\n):\n\n character = generator(\n f\"<s>[INST]Give me a character description. Describe {prompt}.[/INST]\"\n )\n\n return character\n
"},{"location":"cookbook/deploy-using-cerebrium/#run-on-the-cloud","title":"Run on the cloud","text":"cerebrium deploy\n
You will see your application deploy, install pip packages and download the model. Once completed it will output a CURL request you can use to call your endpoint. Just remember to end the url with the function you would like to call - in this case /generate. You should see your response returned!
"},{"location":"cookbook/deploy-using-modal/","title":"Run Outlines using Modal","text":"Modal is a serverless platform that allows you to easily run code on the cloud, including GPUs. It can come very handy for those of us who don't have a monster GPU at home and want to be able to quickly and easily provision, configure and orchestrate cloud infrastructure.
In this guide we will show you how you can use Modal to run programs written with Outlines on GPU in the cloud.
"},{"location":"cookbook/deploy-using-modal/#requirements","title":"Requirements","text":"We recommend installing modal
and outlines
in a virtual environment. You can create one with:
python -m venv venv\nsource venv/bin/activate\n
Then install the required packages:
pip install modal outlines\n
"},{"location":"cookbook/deploy-using-modal/#build-the-image","title":"Build the image","text":"First we need to define our container image. If you need to access a gated model, you will need to provide an access token. See the .env
call below for how to provide a HuggingFace token.
Setting a token is best done by setting an environment variable HF_TOKEN
with your token. If you do not wish to do this, we provide a commented-out line in the code to set the token directly in the code.
from modal import Image, App, gpu\nimport os\n\n# This creates a modal App object. Here we set the name to \"outlines-app\".\n# There are other optional parameters like modal secrets, schedules, etc.\n# See the documentation here: https://modal.com/docs/reference/modal.App\napp = App(name=\"outlines-app\")\n\n# Specify a language model to use.\n# Another good model to use is \"NousResearch/Hermes-2-Pro-Mistral-7B\"\nlanguage_model = \"mistral-community/Mistral-7B-v0.2\"\n\n# Please set an environment variable HF_TOKEN with your Hugging Face API token.\n# The code below (the .env({...}) part) will copy the token from your local\n# environment to the container.\n# More info on Image here: https://modal.com/docs/reference/modal.Image\noutlines_image = Image.debian_slim(python_version=\"3.11\").pip_install(\n \"outlines\",\n \"transformers\",\n \"datasets\",\n \"accelerate\",\n \"sentencepiece\",\n).env({\n # This will pull in your HF_TOKEN environment variable if you have one.\n 'HF_TOKEN':os.environ['HF_TOKEN']\n\n # To set the token directly in the code, uncomment the line below and replace\n # 'YOUR_TOKEN' with the HuggingFace access token.\n # 'HF_TOKEN':'YOUR_TOKEN'\n})\n
"},{"location":"cookbook/deploy-using-modal/#setting-the-container-up","title":"Setting the container up","text":"When running longer Modal apps, it's recommended to download your language model when the container starts, rather than when the function is called. This will cache the model for future runs.
# This function imports the model from Hugging Face. The modal container\n# will call this function when it starts up. This is useful for\n# downloading models, setting up environment variables, etc.\ndef import_model():\n import outlines\n outlines.models.transformers(language_model)\n\n# This line tells the container to run the import_model function when it starts.\noutlines_image = outlines_image.run_function(import_model)\n
"},{"location":"cookbook/deploy-using-modal/#define-a-schema","title":"Define a schema","text":"We will run the JSON-structured generation example in the README, with the following schema:
# Specify a schema for the character description. In this case,\n# we want to generate a character with a name, age, armor, weapon, and strength.\nschema = \"\"\"{\n \"title\": \"Character\",\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\n \"title\": \"Name\",\n \"maxLength\": 10,\n \"type\": \"string\"\n },\n \"age\": {\n \"title\": \"Age\",\n \"type\": \"integer\"\n },\n \"armor\": {\"$ref\": \"#/definitions/Armor\"},\n \"weapon\": {\"$ref\": \"#/definitions/Weapon\"},\n \"strength\": {\n \"title\": \"Strength\",\n \"type\": \"integer\"\n }\n },\n \"required\": [\"name\", \"age\", \"armor\", \"weapon\", \"strength\"],\n \"definitions\": {\n \"Armor\": {\n \"title\": \"Armor\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"leather\", \"chainmail\", \"plate\"],\n \"type\": \"string\"\n },\n \"Weapon\": {\n \"title\": \"Weapon\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"sword\", \"axe\", \"mace\", \"spear\", \"bow\", \"crossbow\"],\n \"type\": \"string\"\n }\n }\n}\"\"\"\n
To make the inference work on Modal we need to wrap the corresponding function in a @app.function
decorator. We pass to this decorator the image and GPU on which we want this function to run.
Let's choose an A100 with 80GB memory. Valid GPUs can be found here.
# Define a function that uses the image we chose, and specify the GPU\n# and memory we want to use.\n@app.function(image=outlines_image, gpu=gpu.A100(size='80GB'))\ndef generate(\n prompt: str = \"Amiri, a 53 year old warrior woman with a sword and leather armor.\",\n):\n # Remember, this function is being executed in the container,\n # so we need to import the necessary libraries here. You should\n # do this with any other libraries you might need.\n import outlines\n\n # Load the model into memory. The import_model function above\n # should have already downloaded the model, so this call\n # only loads the model into GPU memory.\n model = outlines.models.transformers(\n language_model, device=\"cuda\"\n )\n\n # Generate a character description based on the prompt.\n # We use the .json generation method -- we provide the\n # - model: the model we loaded above\n # - schema: the JSON schema we defined above\n generator = outlines.generate.json(model, schema)\n\n # Make sure you wrap your prompt in instruction tags ([INST] and [/INST])\n # to indicate that the prompt is an instruction. Instruction tags can vary\n # by models, so make sure to check the model's documentation.\n character = generator(\n f\"<s>[INST]Give me a character description. Describe {prompt}.[/INST]\"\n )\n\n # Print out the generated character.\n print(character)\n
We then need to define a local_entrypoint
to call our function generate
remotely.
@app.local_entrypoint()\ndef main(\n prompt: str = \"Amiri, a 53 year old warrior woman with a sword and leather armor.\",\n):\n # We use the \"generate\" function defined above -- note too that we are calling\n # .remote() on the function. This tells modal to run the function in our cloud\n # machine. If you want to run the function locally, you can call .local() instead,\n # though this will require additional setup.\n generate.remote(prompt)\n
Here @app.local_entrypoint()
decorator defines main
as the function to start from locally when using the Modal CLI. You can save above code to example.py
(or use this implementation). Let's now see how to run the code on the cloud using the Modal CLI.
"},{"location":"cookbook/deploy-using-modal/#run-on-the-cloud","title":"Run on the cloud","text":"First install the Modal client from PyPi, if you have not already:
pip install modal\n
You then need to obtain a token from Modal. Run the following command:
modal setup\n
Once that is set you can run inference on the cloud using:
modal run example.py\n
You should see the Modal app initialize, and soon after see the result of the print
function in your terminal. That's it!
"},{"location":"cookbook/earnings-reports/","title":"Extracting financial data from earnings reports","text":"A common task in finance is to extract financial data from earnings reports. Earnings reports are infamously poorly formatted, as the SEC does not have requirements for producing machine-readable documents.
Earnings reports are often provided as HTML documents, which can be difficult to parse. Investors often use complicated parsing systems or manual review to extract data. Entire companies are built around automating this task.
This cookbook is a proof of concept about how we can use LLMs to extract financial data directly into CSV. Comma-separated values are well-structured and can be defined by a regular expression, which Outlines can use to guide the LLM's output.
The example is a smaller subset of a full demo found here. The demo contains the full set of pre-processing steps needed to convert raw HTML into a structured CSV file, and tests the results across three company's 10k reports.
"},{"location":"cookbook/earnings-reports/#setup","title":"Setup","text":"Install outlines and required dependencies:
# Later versions of torch can have difficulty with certain CUDA drivers.\n# We recommend using 2.4.0 for now, but you may wish to experiment with\n# other versions.\npip install outlines pandas transformers torch==2.4.0 accelerate\n
"},{"location":"cookbook/earnings-reports/#load-the-model","title":"Load the model","text":"Choose your language model. We'll use Phi-3 mini, which is small enough to run on reasonably small machines.
import outlines\nimport torch\n\nmodel_name = 'microsoft/Phi-3-mini-4k-instruct'\nmodel = outlines.models.transformers(\n model_name,\n device='auto',\n model_kwargs={\n # To reduce memory usage, we'll use bfloat16\n \"torch_dtype\": torch.bfloat16,\n },\n)\n
"},{"location":"cookbook/earnings-reports/#set-up-the-data","title":"Set up the data","text":"For brevity, we've attached the markdown version of Nvidia's 10k report. The full demonstration processes the raw HTML version of the report to these markdown tables. Pages are filtered by whether they seem to contain income statements, and then compacted into the string you see below.
income_statement = \"\"\"\nTable of ContentsNVIDIA Corporation and SubsidiariesConsolidated Statements of Income(In millions, except per share data)\n\n| | | | | | | | | | | | | | | | | | |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| | | | Year Ended | | | | | | | | | | | | | | |\n| | | | Jan 28, 2024 | | | | | | Jan 29, 2023 | | | | | | Jan 30, 2022 | | |\n| Revenue | | | $ | 60,922 | | | | | $ | 26,974 | | | | | $ | 26,914 | |\n| Cost of revenue | | | 16,621 | | | | | | 11,618 | | | | | | 9,439 | | |\n| Gross profit | | | 44,301 | | | | | | 15,356 | | | | | | 17,475 | | |\n| Operating expenses | | | | | | | | | | | | | | | | | |\n| Research and development | | | 8,675 | | | | | | 7,339 | | | | | | 5,268 | | |\n| Sales, general and administrative | | | 2,654 | | | | | | 2,440 | | | | | | 2,166 | | |\n| Acquisition termination cost | | | \u0097 | | | | | | 1,353 | | | | | | \u0097 | | |\n| Total operating expenses | | | 11,329 | | | | | | 11,132 | | | | | | 7,434 | | |\n| Operating income | | | 32,972 | | | | | | 4,224 | | | | | | 10,041 | | |\n| Interest income | | | 866 | | | | | | 267 | | | | | | 29 | | |\n| Interest expense | | | (257) | | | | | | (262) | | | | | | (236) | | |\n| Other, net | | | 237 | | | | | | (48) | | | | | | 107 | | |\n| Other income (expense), net | | | 846 | | | | | | (43) | | | | | | (100) | | |\n| Income before income tax | | | 33,818 | | | | | | 4,181 | | | | | | 9,941 | | |\n| Income tax expense (benefit) | | | 4,058 | | | | | | (187) | | | | | | 189 | | |\n| Net income | | | $ | 29,760 | | | | | $ | 4,368 | | | | | $ | 9,752 | |\n| | | | | | | | | | | | | | | | | | |\n| Net income per share: | | | | | | | | | | | | | | | | | |\n| Basic | | | $ | 12\\.05 | | | | | $ | 1\\.76 | | | | | $ | 3\\.91 | |\n| Diluted | | | $ | 11\\.93 | | | | | $ | 1\\.74 | | | | | $ | 3\\.85 | |\n| | | | | | | | | | | | | | | | | | |\n| Weighted average shares used in per share computation: | | | | | | | | | | | | | | | | | |\n| Basic | | | 2,469 | | | | | | 2,487 | | | | | | 2,496 | | |\n| Diluted | | | 2,494 | | | | | | 2,507 | | | | | | 2,535 | | |\n\"\"\"\n
The markdown tables extracted from the earnings reports can vary widely in row names, column counts, data types, etc. The advantage of LLMs here is that we can define the data we want in terms of the data types, and the LLM will output the data in the desired format.
For comparison, here is how the income statement looks in the original HTML:
"},{"location":"cookbook/earnings-reports/#define-the-data-we-want","title":"Define the data we want","text":"Outlines is often used for JSON output, but it can also be used for CSV. We know the columns we want to extract, and we know the data types of the columns. Year for example is always a four-digit number, revenue is a number with commas, and so on.
We can define a regex pattern for each column type:
# Define the column type regex patterns\ncolumn_types = {\n # Year is always a four-digit number\n \"year\": r\"\\d{4}\",\n\n # Revenue, operating income, and net income are always numbers with commas.\n # This regex permits integers that may begin with a minus sign, and may have\n # commas separating the thousands, millions, etc.\n \"integer_comma\": r\"((-?\\d+),?\\d+|(-?\\d+))\",\n # Number is currently not used, but it represents a number with up to two decimal places.\n \"number\": r\"(-?\\d+(?:\\.\\d{1,2})?)\",\n}\n
Next, let's choose the columns we want to extract. We want
- Year, always a four-digit number
- Revenue, a number with commas
- Operating income, a number with commas
- Net income, a number with commas
# Define the columns to extract, and their data types.\ncolumns_to_extract = {\n \"year\": \"year\",\n \"revenue\": \"integer_comma\",\n \"operating_income\": \"integer_comma\",\n \"net_income\": \"integer_comma\",\n}\n
You can modify column_type_regex
to match the data types of the columns you want to extract. Adding a new financial metric to extract is as simple as adding a new key/value pair to columns_to_extract
:
columns_to_extract[\"diluted_earnings_per_share\"] = \"number\"\n
Additional columns are not well tested for accuracy, so use with caution.
"},{"location":"cookbook/earnings-reports/#create-the-regex-describing-the-data-we-want","title":"Create the regex describing the data we want","text":"# Create the header line. This is the requested column names\n# separated by commas, i.e. \"year,revenue,...\"\nheader = \",\".join(columns_to_extract.keys())\n\n# Create the data capture patterns. These are the regex patterns\n# that will be used to capture the data in each column\ndata_patterns = [column_types[dtype] for dtype in columns_to_extract.values()]\ndata_line = \",\".join(data_patterns)\n\n# Our final regex pattern.\nmax_rows = 3 # We expect 3 rows of data, firms usually report 3 years of income statements\ncsv_regex = f\"{header}(\\n{data_line}){{,{max_rows}}}\\n\\n\"\n\nprint(csv_regex)\n
which gives us
year,revenue,operating_income,net_income,basic_earnings_per_share(\n\\d{4},((-?\\d+),?\\d+|(-?\\d+)),((-?\\d+),?\\d+|(-?\\d+)),((-?\\d+),?\\d+|(-?\\d+)),(-?\\d+(?:\\.\\d{1,2})?)){,3}\n
Pretty hairy, right? Thankfully, we have a simple function to construct this regex for you. The regex defines a header line, followed by a data line that repeats for each row of data we want to extract. Passing the regex to outlines.generate.regex
will produce a function that will always produce a CSV string that is consistent with the regex.
"},{"location":"cookbook/earnings-reports/#prompting-the-model","title":"Prompting the model","text":"Outlines does not add system or instruction tokens by default, so we need to use transformers.AutoTokenizer
to add them for whatever model we're using.
from transformers import AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n\ndef add_instruction(prompt):\n return tokenizer.apply_chat_template([{\"role\": \"user\", \"content\": prompt}], tokenize=False, add_generation_prompt=True)\n\nprint(add_instruction(\"Howdy\"))\n
<|user|>\nHowdy<|end|>\n<|assistant|>\n
Our prompt roughly describes the task we want the model to perform, and a few pieces of information it may need to know about income statements.
def extract_financial_data_prompt(columns_to_extract, income_statement):\n user_prompt = f\"\"\"\n Extract annual financial data from this set of pages. Pages\n are from a 10k filing and were chosen because they may contain\n a comprehensive income statement. Note that selected pages may\n be incorrectly extracted, so you should verify that you are extracting\n from the comprehensive income statement and not some other financial\n statement.\n\n Create a row for each year available in the income statement with the\n following columns: {', '.join(columns_to_extract.keys())}. Firms typically report the\n most recent 3 years of data, but this can vary.\n\n Each column has types: {', '.join(columns_to_extract.values())}.\n\n # Relevant pages:\n\n {income_statement}\n\n # Key instructions:\n\n 1. Look ONLY at the \"Consolidated Statements of Income\" table\n 2. For operating income, look for \"Income from operations\" or \"Operating income\"\n 3. For net income, use the TOTAL net income figure, not amounts allocated to specific share classes\n 4. Use NULL for missing values\n 5. Operating income must be less than revenue\n 6. Net income must be less than operating income\n 7. Ignore segment breakdowns, quarterly data, or per-share amounts\n\n # Output format:\n\n - CSV format with headers: {','.join(columns_to_extract.keys())}\n - Use NULL for missing values\n - If no data are found, do not create a row.\n - Enter two newline characters to terminate the CSV when no more data are found.\n\n # Definitions:\n - Revenue: Total sales of goods and services. Usually this is at the top of the\n income statement.\n - Operating income: Revenue minus operating expenses for the entire company. This is revenue\n minus costs. Operating income is also called operating profit, EBIT, or income from\n operations.\n - Net income: Operating income minus taxes. This is the bottom line of the\n income statement.\n \"\"\"\n\n return add_instruction(user_prompt)\n
"},{"location":"cookbook/earnings-reports/#running-the-model","title":"Running the model","text":"Now that we have our prompt and regular expression, we can run the model.
Construct our regex extractor function. We'll use a greedy sampler, which samples the most likely next token at each step. It's a simple sampler that is more reproducible than multinomial sampling.
csv_extractor = outlines.generate.regex(\n model, csv_regex, sampler=outlines.samplers.greedy()\n)\n
Provide the prompt to the model and run it:
csv_data = csv_extractor(\n extract_financial_data_prompt(columns_to_extract, income_statement),\n max_tokens=1024,\n)\n\nprint(csv_data)\n
year,revenue,operating_income,net_income\n2024,60922,32972,29760\n2023,26974,4224,4368\n2022,26914,10041,9752\n
Voila! We've extracted the financial data from the income statement, and it's correct upon inspection.
You can even load this into a pandas
DataFrame for further analysis:
import pandas as pd\nfrom io import StringIO\n\ndf = pd.read_csv(StringIO(csv_data))\nprint(df)\n
year revenue operating_income net_income\n0 2024 60922 32972 29760\n1 2023 26974 4224 4368\n2 2022 26914 10041 9752\n
"},{"location":"cookbook/extract_event_details/","title":"Extract events details from text","text":"This recipe demonstrates how to use the outlines
library to extract structured event details from a text message. We will extract the title, location, and start date and time from messages like the following:
Hello Kitty, my grandmother will be here, I think it's better to postpone\nour appointment to review math lessons to next Monday at 2pm at the same\nplace, 3 avenue des tanneurs, one hour will be enough see you \ud83d\ude18\n
Let see how to extract the event details from the message with the MLX library dedicated to Apple Silicon processor (M series).
from datetime import datetime\n\nfrom pydantic import BaseModel, Field\n\nfrom outlines import generate, models\n\n# Load the model\nmodel = models.mlxlm(\"mlx-community/Hermes-3-Llama-3.1-8B-8bit\")\n\n\n# Define the event schema using Pydantic\nclass Event(BaseModel):\n title: str = Field(description=\"title of the event\")\n location: str\n start: datetime = Field(\n default=None, description=\"date of the event if available in iso format\"\n )\n\n\n# Get the current date and time\nnow = datetime.now().strftime(\"%A %d %B %Y and it's %H:%M\")\n\n# Define the prompt\nprompt = f\"\"\"\nToday's date and time are {now}\nGiven a user message, extract information of the event like date and time in iso format, location and title.\nIf the given date is relative, think step by step to find the right date.\nHere is the message:\n\"\"\"\n\n# Sample message\nmessage = \"\"\"Hello Kitty, my grandmother will be here , I think it's better to postpone our\nappointment to review math lessons to next Friday at 2pm at the same place, 3 avenue des tanneurs, I think that one hour will be enough\nsee you \ud83d\ude18 \"\"\"\n\n# Create the generator\ngenerator = generate.json(model, Event)\n\n# Extract the event information\nevent = generator(prompt + message)\n\n# Print the current date and time\nprint(f\"Today: {now}\")\n\n# Print the extracted event information in JSON format\nprint(event.json())\n
The output will be:
Today: Saturday 16 November 2024 and it's 10:55\n
and the extracted event information will be:
{\n \"title\":\"Math Review\",\n \"location\":\"3 avenue des tanneurs\",\n \"start\":\"2024-11-22T14:00:00Z\"\n}\n
To find out more about this use case, we recommend the project developped by Joseph Rudoler the ICS Generator
"},{"location":"cookbook/extraction/","title":"Named entity extraction","text":"Named Entity Extraction is a fundamental problem in NLP. It involves identifying and categorizing named entities within a document: people, organization, dates, places, etc. It is usually the first step in a more complex NLP worklow. Here we will use the example of a pizza restaurant that receives orders via their website and need to identify the number and types of pizzas that are being ordered.
Getting LLMs to output the extracted entities in a structured format can be challenging. In this tutorial we will see how we can use Outlines' JSON-structured generation to extract entities from a document and return them in a valid JSON data structure 100% of the time.
As always, we start with initializing the model. We will be using a quantized version of Mistal-7B-v0.1 (we're GPU poor):
import outlines\n\nmodel = outlines.models.transformers(\"TheBloke/Mistral-7B-OpenOrca-AWQ\", device=\"cuda\")\n
And we will be using the following prompt template:
@outlines.prompt\ndef take_order(order):\n \"\"\"You are the owner of a pizza parlor. Customers \\\n send you orders from which you need to extract:\n\n 1. The pizza that is ordered\n 2. The number of pizzas\n\n # EXAMPLE\n\n ORDER: I would like one Margherita pizza\n RESULT: {\"pizza\": \"Margherita\", \"number\": 1}\n\n # OUTPUT INSTRUCTIONS\n\n Answer in valid JSON. Here are the different objects relevant for the output:\n\n Order:\n pizza (str): name of the pizza\n number (int): number of pizzas\n\n Return a valid JSON of type \"Order\"\n\n # OUTPUT\n\n ORDER: {{ order }}\n RESULT: \"\"\"\n
We now define our data model using Pydantic:
from enum import Enum\nfrom pydantic import BaseModel\n\nclass Pizza(str, Enum):\n margherita = \"Margherita\"\n pepperonni = \"Pepperoni\"\n calzone = \"Calzone\"\n\nclass Order(BaseModel):\n pizza: Pizza\n number: int\n
We can now define our generator and call it on several incoming orders:
orders = [\n \"Hi! I would like to order two pepperonni pizzas and would like them in 30mins.\",\n \"Is it possible to get 12 margheritas?\"\n]\nprompts = [take_order(order) for order in orders]\n\ngenerator = outlines.generate.json(model, Order)\n\nresults = generator(prompts)\nprint(results)\n# [Order(pizza=<Pizza.pepperonni: 'Pepperoni'>, number=2),\n# Order(pizza=<Pizza.margherita: 'Margherita'>, number=12)]\n
There are several ways you could improve this example:
- Clients may order several types of pizzas.
- Clients may order drinks as well.
- If the pizza place has a delivery service we need to extract the client's address and phone number
- Clients may specify the time for which they want the pizza. We could then check against a queuing system and reply to them with the estimated delivery time.
How would you change the Pydantic model to account for these use cases?
"},{"location":"cookbook/knowledge_graph_extraction/","title":"Knowledge Graph Extraction","text":"In this guide, we use outlines to extract a knowledge graph from unstructured text.
We will use llama.cpp using the llama-cpp-python library. Outlines supports llama-cpp-python, but we need to install it ourselves:
pip install llama-cpp-python\n
We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
import llama_cpp\nfrom outlines import generate, models\n\nmodel = models.llamacpp(\"NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF\",\n \"Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False)\n
(Optional) Store the model weights in a custom folder By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model Hermes-2-Pro-Llama-3-8B by NousResearch from HuggingFace:
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\n
We initialize the model:
import llama_cpp\nfrom llama_cpp import Llama\nfrom outlines import generate, models\n\nllm = Llama(\n \"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False\n)\n
"},{"location":"cookbook/knowledge_graph_extraction/#knowledge-graph-extraction_1","title":"Knowledge Graph Extraction","text":"We first need to define our Pydantic class for each node and each edge of the knowledge graph:
from pydantic import BaseModel, Field\n\nclass Node(BaseModel):\n \"\"\"Node of the Knowledge Graph\"\"\"\n\n id: int = Field(..., description=\"Unique identifier of the node\")\n label: str = Field(..., description=\"Label of the node\")\n property: str = Field(..., description=\"Property of the node\")\n\n\nclass Edge(BaseModel):\n \"\"\"Edge of the Knowledge Graph\"\"\"\n\n source: int = Field(..., description=\"Unique source of the edge\")\n target: int = Field(..., description=\"Unique target of the edge\")\n label: str = Field(..., description=\"Label of the edge\")\n property: str = Field(..., description=\"Property of the edge\")\n
We then define the Pydantic class for the knowledge graph and get its JSON schema:
from typing import List\n\nclass KnowledgeGraph(BaseModel):\n \"\"\"Generated Knowledge Graph\"\"\"\n\n nodes: List[Node] = Field(..., description=\"List of nodes of the knowledge graph\")\n edges: List[Edge] = Field(..., description=\"List of edges of the knowledge graph\")\n\nschema = KnowledgeGraph.model_json_schema()\n
We then need to adapt our prompt to the Hermes prompt format for JSON schema:
def generate_hermes_prompt(user_prompt):\n return (\n \"<|im_start|>system\\n\"\n \"You are a world class AI model who answers questions in JSON \"\n f\"Here's the json schema you must adhere to:\\n<schema>\\n{schema}\\n</schema><|im_end|>\\n\"\n \"<|im_start|>user\\n\"\n + user_prompt\n + \"<|im_end|>\"\n + \"\\n<|im_start|>assistant\\n\"\n \"<schema>\"\n )\n
For a given user prompt, for example:
user_prompt = \"Alice loves Bob and she hates Charlie.\"\n
We can use generate.json
by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt:
from outlines import generate, models\n\nmodel = models.LlamaCpp(llm)\ngenerator = generate.json(model, KnowledgeGraph)\nprompt = generate_hermes_prompt(user_prompt)\nresponse = generator(prompt, max_tokens=1024, temperature=0, seed=42)\n
We obtain the nodes and edges of the knowledge graph:
print(response.nodes)\nprint(response.edges)\n# [Node(id=1, label='Alice', property='Person'),\n# Node(id=2, label='Bob', property='Person'),\n# Node(id=3, label='Charlie', property='Person')]\n# [Edge(source=1, target=2, label='love', property='Relationship'),\n# Edge(source=1, target=3, label='hate', property='Relationship')]\n
"},{"location":"cookbook/knowledge_graph_extraction/#optional-visualizing-the-knowledge-graph","title":"(Optional) Visualizing the Knowledge Graph","text":"We can use the Graphviz library to visualize the generated knowledge graph. For detailed installation instructions, see here.
from graphviz import Digraph\n\ndot = Digraph()\nfor node in response.nodes:\n dot.node(str(node.id), node.label, shape='circle', width='1', height='1')\nfor edge in response.edges:\n dot.edge(str(edge.source), str(edge.target), label=edge.label)\n\ndot.render('knowledge-graph.gv', view=True)\n
This example was originally contributed by Alonso Silva.
"},{"location":"cookbook/models_playing_chess/","title":"Large language models playing chess","text":"In this example we will make a Phi-2 model play chess against itself. On its own the model easily generates invalid moves, so we will give it a little help. At each step we will generate a regex that only matches valid move, and use it to help the model only generating valid moves.
"},{"location":"cookbook/models_playing_chess/#the-chessboard","title":"The chessboard","text":"The game will be played on a standard checkboard. We will use the chess
library to track the opponents' moves, and check that the moves are valid.
%pip install outlines -q\n%pip install chess -q\n%pip install transformers accelerate einops -q\n\nimport chess\n\nboard = chess.Board(\"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1\")\n
"},{"location":"cookbook/models_playing_chess/#the-opponents","title":"The opponents","text":"Phi-2 will be playing against itself:
from outlines import models\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n
"},{"location":"cookbook/models_playing_chess/#a-little-help-for-the-language-model","title":"A little help for the language model","text":"To make sure Phi-2 generates valid chess moves we will use Outline's regex-structured generation. We define a function that takes the current state of the board and returns a regex that matches all possible legal moves:
import re\n\ndef legal_moves_regex(board):\n \"\"\"Build a regex that only matches valid moves.\"\"\"\n legal_moves = list(board.legal_moves)\n legal_modes_str = [board.san(move) for move in legal_moves]\n legal_modes_str = [re.sub(r\"[+#]\", \"\", move) for move in legal_modes_str]\n regex_pattern = \"|\".join(re.escape(move) for move in legal_modes_str)\n regex_pattern = f\"{regex_pattern}\"\n return regex_pattern\n
"},{"location":"cookbook/models_playing_chess/#prompting-the-language-model","title":"Prompting the language model","text":"The prompt corresponds to the current state of the board, so we start with:
prompt = \"Let's play Chess. Moves: \"\n
We update the prompt at each step so it reflects the state of the board after the previous move.
"},{"location":"cookbook/models_playing_chess/#lets-play","title":"Let's play","text":"from outlines import generate\n\nboard_state = \" \"\nturn_number = 0\nwhile not board.is_game_over():\n regex_pattern = legal_moves_regex(board)\n structured = generate.regex(model, regex_pattern)(prompt + board_state)\n move = board.parse_san(structured)\n\n if turn_number % 2 == 0 : # It's White's turn\n board_state += board.san(move) + \" \"\n else:\n board_state += board.san(move) + \" \" + str(turn_number) + \".\"\n\n turn_number += 1\n\n board.push(move)\n\n print(board_state)\n
Interestingly enough, Phi-2 hates capturing.
e4 e5 1.Nf3 Ne7 3.b4 Nf5 5.Nc3 Ne7 7.Bb5 a6 9.Na4 b6 11.c3 Nec6 13.c4 a5 15.d4 Qg5 17.Nd2 Bb7 19.dxe5\n
This example was originally authored by @903124S in this gist.
"},{"location":"cookbook/qa-with-citations/","title":"Generate Synthetic Data and Q&A with Citations","text":"This tutorial is adapted from the instructor-ollama notebook. We start with a simple example to generate synthetic data and then we approach the problem of question answering by providing citations.
We will use llama.cpp using the llama-cpp-python library. Outlines supports llama-cpp-python, but we need to install it ourselves:
pip install llama-cpp-python\n
We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
import llama_cpp\nfrom outlines import generate, models\n\nmodel = models.llamacpp(\"NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF\",\n \"Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False)\n
(Optional) Store the model weights in a custom folder By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model Hermes-2-Pro-Llama-3-8B by NousResearch from HuggingFace:
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\n
We initialize the model:
import llama_cpp\nfrom llama_cpp import Llama\nfrom outlines import generate, models\n\nllm = Llama(\n \"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False\n)\n
"},{"location":"cookbook/qa-with-citations/#generate-synthetic-data","title":"Generate Synthetic Data","text":"We first need to define our Pydantic class for a user:
from pydantic import BaseModel, Field\n\nclass UserDetail(BaseModel):\n id: int = Field(..., description=\"Unique identifier\") # so the model keeps track of the number of users\n first_name: str\n last_name: str\n age: int\n
We then define a Pydantic class for a list of users:
from typing import List\n\nclass Users(BaseModel):\n users: List[UserDetail]\n
We can use a generate.json
by passing this Pydantic class we just defined, and call the generator:
model = models.LlamaCpp(llm)\ngenerator = generate.json(model, Users)\nresponse = generator(\"Create 5 fake users\", max_tokens=1024, temperature=0, seed=42)\nprint(response.users)\n# [UserDetail(id=1, first_name='John', last_name='Doe', age=25),\n# UserDetail(id=2, first_name='Jane', last_name='Doe', age=30),\n# UserDetail(id=3, first_name='Bob', last_name='Smith', age=40),\n# UserDetail(id=4, first_name='Alice', last_name='Smith', age=35),\n# UserDetail(id=5, first_name='John', last_name='Smith', age=20)]\n
for user in response.users:\n print(user.first_name)\n print(user.last_name)\n print(user.age)\n print(#####)\n# John\n# Doe\n# 25\n# #####\n# Jane\n# Doe\n# 30\n# #####\n# Bob\n# Smith\n# 40\n# #####\n# Alice\n# Smith\n# 35\n# #####\n# John\n# Smith\n# 20\n# #####\n
"},{"location":"cookbook/qa-with-citations/#qa-with-citations","title":"QA with Citations","text":"We first need to define our Pydantic class for QA with citations:
from typing import List\nfrom pydantic import BaseModel\n\nclass QuestionAnswer(BaseModel):\n question: str\n answer: str\n citations: List[str]\n\nschema = QuestionAnswer.model_json_schema()\n
We then need to adapt our prompt to the Hermes prompt format for JSON schema:
def generate_hermes_prompt(question, context, schema=schema):\n return (\n \"<|im_start|>system\\n\"\n \"You are a world class AI model who answers questions in JSON with correct and exact citations \"\n \"extracted from the `Context`. \"\n f\"Here's the json schema you must adhere to:\\n<schema>\\n{schema}\\n</schema><|im_end|>\\n\"\n \"<|im_start|>user\\n\"\n + \"`Context`: \"\n + context\n + \"\\n`Question`: \"\n + question + \"<|im_end|>\"\n + \"\\n<|im_start|>assistant\\n\"\n \"<schema>\"\n )\n
We can use generate.json
by passing the Pydantic class we previously defined, and call the generator with Hermes prompt:
question = \"What did the author do during college?\"\ncontext = \"\"\"\nMy name is Jason Liu, and I grew up in Toronto Canada but I was born in China.\nI went to an arts high school but in university I studied Computational Mathematics and physics.\nAs part of coop I worked at many companies including Stitchfix, Facebook.\nI also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.\n\"\"\"\ngenerator = generate.json(model, QuestionAnswer)\nprompt = generate_hermes_prompt(question, context)\nresponse = generator(prompt, max_tokens=1024, temperature=0, seed=42)\nprint(response)\n# QuestionAnswer(question='What did the author do during college?', answer='The author studied Computational Mathematics and physics in university and was also involved in starting the Data Science club, serving as its president for 2 years.', citations=['I went to an arts high school but in university I studied Computational Mathematics and physics.', 'I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.'])\n
We can do the same for a list of question-context pairs:
question1 = \"Where was John born?\"\ncontext1 = \"\"\"\nJohn Doe is a software engineer who was born in New York, USA.\nHe studied Computer Science at the Massachusetts Institute of Technology.\nDuring his studies, he interned at Google and Microsoft.\nHe also founded the Artificial Intelligence club at his university and served as its president for three years.\n\"\"\"\n\nquestion2 = \"What did Emily study in university?\"\ncontext2 = \"\"\"\nEmily Smith is a data scientist from London, England.\nShe attended the University of Cambridge where she studied Statistics and Machine Learning.\nShe interned at IBM and Amazon during her summer breaks.\nEmily was also the head of the Women in Tech society at her university.\n\"\"\"\n\nquestion3 = \"Which companies did Robert intern at?\"\ncontext3 = \"\"\"\nRobert Johnson, originally from Sydney, Australia, is a renowned cybersecurity expert.\nHe studied Information Systems at the University of Melbourne.\nRobert interned at several cybersecurity firms including NortonLifeLock and McAfee.\nHe was also the leader of the Cybersecurity club at his university.\n\"\"\"\n\nquestion4 = \"What club did Alice start at her university?\"\ncontext4 = \"\"\"\nAlice Williams, a native of Dublin, Ireland, is a successful web developer.\nShe studied Software Engineering at Trinity College Dublin.\nAlice interned at several tech companies including Shopify and Squarespace.\nShe started the Web Development club at her university and was its president for two years.\n\"\"\"\n\nquestion5 = \"What did Michael study in high school?\"\ncontext5 = \"\"\"\nMichael Brown is a game developer from Tokyo, Japan.\nHe attended a specialized high school where he studied Game Design.\nHe later attended the University of Tokyo where he studied Computer Science.\nMichael interned at Sony and Nintendo during his university years.\nHe also started the Game Developers club at his university.\n\"\"\"\n\nfor question, context in [\n (question1, context1),\n (question2, context2),\n (question3, context3),\n (question4, context4),\n (question5, context5),\n]:\n final_prompt = my_final_prompt(question, context)\n generator = generate.json(model, QuestionAnswer)\n response = generator(final_prompt, max_tokens=1024, temperature=0, seed=42)\n display(question)\n display(response.answer)\n display(response.citations)\n print(\"\\n\\n\")\n\n# 'Where was John born?'\n# 'John Doe was born in New York, USA.'\n# ['John Doe is a software engineer who was born in New York, USA.']\n#\n#\n# 'What did Emily study in university?'\n# 'Emily studied Statistics and Machine Learning in university.'\n# ['She attended the University of Cambridge where she studied Statistics and Machine Learning.']\n#\n#\n# 'Which companies did Robert intern at?'\n# 'Robert interned at NortonLifeLock and McAfee.'\n# ['Robert Johnson, originally from Sydney, Australia, is a renowned cybersecurity expert. He interned at several cybersecurity firms including NortonLifeLock and McAfee.']\n#\n#\n# 'What club did Alice start at her university?'\n# 'Alice started the Web Development club at her university.'\n# ['Alice Williams, a native of Dublin, Ireland, is a successful web developer. She started the Web Development club at her university and was its president for two years.']\n#\n#\n# 'What did Michael study in high school?'\n# 'Michael studied Game Design in high school.'\n# ['Michael Brown is a game developer from Tokyo, Japan. He attended a specialized high school where he studied Game Design.']\n
This example was originally contributed by Alonso Silva.
"},{"location":"cookbook/react_agent/","title":"ReAct Agent","text":"This example shows how to use outlines to build your own agent with open weights local models and structured outputs. It is inspired by the blog post A simple Python implementation of the ReAct pattern for LLMs by Simon Willison.
The ReAct pattern (for Reason+Act) is described in the paper ReAct: Synergizing Reasoning and Acting in Language Models. It's a pattern where you implement additional actions that an LLM can take - searching Wikipedia or running calculations for example - and then teach it how to request the execution of those actions, and then feed their results back into the LLM.
Additionally, we give the LLM the possibility of using a scratchpad described in the paper Show Your Work: Scratchpads for Intermediate Computation with Language Models which improves the ability of LLMs to perform multi-step computations.
We use llama.cpp using the llama-cpp-python library. Outlines supports llama-cpp-python, but we need to install it ourselves:
pip install llama-cpp-python\n
We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
import llama_cpp\nfrom outlines import generate, models\n\nmodel = models.llamacpp(\"NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF\",\n \"Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False)\n
(Optional) Store the model weights in a custom folder By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model Hermes-2-Pro-Llama-3-8B by NousResearch from HuggingFace:
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\n
We initialize the model:
import llama_cpp\nfrom llama_cpp import Llama\nfrom outlines import generate, models\n\nllm = Llama(\n \"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False\n)\n
"},{"location":"cookbook/react_agent/#build-a-react-agent","title":"Build a ReAct agent","text":"In this example, we use two tools:
- wikipedia: \\<search term> - search Wikipedia and returns the snippet of the first result
- calculate: \\<expression> - evaluate an expression using Python's eval() function
import httpx\n\ndef wikipedia(q):\n return httpx.get(\"https://en.wikipedia.org/w/api.php\", params={\n \"action\": \"query\",\n \"list\": \"search\",\n \"srsearch\": q,\n \"format\": \"json\"\n }).json()[\"query\"][\"search\"][0][\"snippet\"]\n\n\ndef calculate(numexp):\n return eval(numexp)\n
We define the logic of the agent through a Pydantic class. First, we want the LLM to decide only between the two previously defined tools:
from enum import Enum\n\nclass Action(str, Enum):\n wikipedia = \"wikipedia\"\n calculate = \"calculate\"\n
Our agent will loop through Thought and Action. We explicitly give the Action Input field so it doesn't forget to add the arguments of the Action. We also add a scratchpad (optional).
from pydantic import BaseModel, Field\n\nclass Reason_and_Act(BaseModel):\n Scratchpad: str = Field(..., description=\"Information from the Observation useful to answer the question\")\n Thought: str = Field(..., description=\"It describes your thoughts about the question you have been asked\")\n Action: Action\n Action_Input: str = Field(..., description=\"The arguments of the Action.\")\n
Our agent will reach a Final Answer. We also add a scratchpad (optional).
class Final_Answer(BaseModel):\n Scratchpad: str = Field(..., description=\"Information from the Observation useful to answer the question\")\n Final_Answer: str = Field(..., description=\"Answer to the question grounded on the Observation\")\n
Our agent will decide when it has reached a Final Answer and therefore to stop the loop of Thought and Action.
from typing import Union\n\nclass Decision(BaseModel):\n Decision: Union[Reason_and_Act, Final_Answer]\n
We could generate a response using the json schema but we will use the regex and check that everything is working as expected:
from outlines.integrations.utils import convert_json_schema_to_str\nfrom outlines.fsm.json_schema import build_regex_from_schema\n\njson_schema = Decision.model_json_schema()\nschema_str = convert_json_schema_to_str(json_schema=json_schema)\nregex_str = build_regex_from_schema(schema_str)\nprint(regex_str)\n# '\\\\{[ ]?\"Decision\"[ ]?:[ ]?(\\\\{[ ]?\"Scratchpad\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?,[ ]?\"Thought\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?,[ ]?\"Action\"[ ]?:[ ]?(\"wikipedia\"|\"calculate\")[ ]?,[ ]?\"Action_Input\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?\\\\}|\\\\{[ ]?\"Scratchpad\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?,[ ]?\"Final_Answer\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?\\\\})[ ]?\\\\}'\n
We then need to adapt our prompt to the Hermes prompt format for JSON schema and explain the agent logic:
import datetime\n\ndef generate_hermes_prompt(question, schema=\"\"):\n return (\n \"<|im_start|>system\\n\"\n \"You are a world class AI model who answers questions in JSON with correct Pydantic schema. \"\n f\"Here's the json schema you must adhere to:\\n<schema>\\n{schema}\\n</schema>\\n\"\n \"Today is \" + datetime.datetime.today().strftime('%Y-%m-%d') + \".\\n\" +\n \"You run in a loop of Scratchpad, Thought, Action, Action Input, PAUSE, Observation. \"\n \"At the end of the loop you output a Final Answer. \"\n \"Use Scratchpad to store the information from the Observation useful to answer the question \"\n \"Use Thought to describe your thoughts about the question you have been asked \"\n \"and reflect carefully about the Observation if it exists. \"\n \"Use Action to run one of the actions available to you. \"\n \"Use Action Input to input the arguments of the selected action - then return PAUSE. \"\n \"Observation will be the result of running those actions. \"\n \"Your available actions are:\\n\"\n \"calculate:\\n\"\n \"e.g. calulate: 4**2 / 3\\n\"\n \"Runs a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary\\n\"\n \"wikipedia:\\n\"\n \"e.g. wikipedia: Django\\n\"\n \"Returns a summary from searching Wikipedia\\n\"\n \"DO NOT TRY TO GUESS THE ANSWER. Begin! <|im_end|>\"\n \"\\n<|im_start|>user\\n\" + question + \"<|im_end|>\"\n \"\\n<|im_start|>assistant\\n\"\n )\n
We define a ChatBot class
class ChatBot:\n def __init__(self, prompt=\"\"):\n self.prompt = prompt\n\n def __call__(self, user_prompt):\n self.prompt += user_prompt\n result = self.execute()\n return result\n\n def execute(self):\n generator = generate.regex(model, regex_str)\n result = generator(self.prompt, max_tokens=1024, temperature=0, seed=42)\n return result\n
We define a query function:
import json\n\ndef query(question, max_turns=5):\n i = 0\n next_prompt = (\n \"\\n<|im_start|>user\\n\" + question + \"<|im_end|>\"\n \"\\n<|im_start|>assistant\\n\"\n )\n previous_actions = []\n while i < max_turns:\n i += 1\n prompt = generate_hermes_prompt(question=question, schema=Decision.model_json_schema())\n bot = ChatBot(prompt=prompt)\n result = bot(next_prompt)\n json_result = json.loads(result)['Decision']\n if \"Final_Answer\" not in list(json_result.keys()):\n scratchpad = json_result['Scratchpad'] if i == 0 else \"\"\n thought = json_result['Thought']\n action = json_result['Action']\n action_input = json_result['Action_Input']\n print(f\"\\x1b[34m Scratchpad: {scratchpad} \\x1b[0m\")\n print(f\"\\x1b[34m Thought: {thought} \\x1b[0m\")\n print(f\"\\x1b[36m -- running {action}: {str(action_input)}\\x1b[0m\")\n if action + \": \" + str(action_input) in previous_actions:\n observation = \"You already run that action. **TRY A DIFFERENT ACTION INPUT.**\"\n else:\n if action==\"calculate\":\n try:\n observation = eval(str(action_input))\n except Exception as e:\n observation = f\"{e}\"\n elif action==\"wikipedia\":\n try:\n observation = wikipedia(str(action_input))\n except Exception as e:\n observation = f\"{e}\"\n print()\n print(f\"\\x1b[33m Observation: {observation} \\x1b[0m\")\n print()\n previous_actions.append(action + \": \" + str(action_input))\n next_prompt += (\n \"\\nScratchpad: \" + scratchpad +\n \"\\nThought: \" + thought +\n \"\\nAction: \" + action +\n \"\\nAction Input: \" + action_input +\n \"\\nObservation: \" + str(observation)\n )\n else:\n scratchpad = json_result[\"Scratchpad\"]\n final_answer = json_result[\"Final_Answer\"]\n print(f\"\\x1b[34m Scratchpad: {scratchpad} \\x1b[0m\")\n print(f\"\\x1b[34m Final Answer: {final_answer} \\x1b[0m\")\n return final_answer\n print(f\"\\nFinal Answer: I am sorry, but I am unable to answer your question. Please provide more information or a different question.\")\n return \"No answer found\"\n
We can now test our ReAct agent:
print(query(\"What's 2 to the power of 10?\"))\n# Scratchpad:\n# Thought: I need to perform a mathematical calculation to find the result of 2 to the power of 10.\n# -- running calculate: 2**10\n#\n# Observation: 1024\n#\n# Scratchpad: 2 to the power of 10 is 1024.\n# Final Answer: 2 to the power of 10 is 1024.\n# 2 to the power of 10 is 1024.\n
print(query(\"What does England share borders with?\"))\n# Scratchpad:\n# Thought: To answer this question, I will use the 'wikipedia' action to gather information about England's geographical location and its borders.\n# -- running wikipedia: England borders\n#\n# Observation: Anglo-Scottish <span class=\"searchmatch\">border</span> (Scottish Gaelic: Cr\u00ecochan Anglo-Albannach) is an internal <span class=\"searchmatch\">border</span> of the United Kingdom separating Scotland and <span class=\"searchmatch\">England</span> which runs for\n#\n# Scratchpad: Anglo-Scottish border (Scottish Gaelic: Cr\u00ecochan Anglo-Albannach) is an internal border of the United Kingdom separating Scotland and England which runs for\n# Final Answer: England shares a border with Scotland.\n# England shares a border with Scotland.\n
As mentioned in Simon's blog post, this is not a very robust implementation at all and there's a ton of room for improvement. But it is lovely how simple it is with a few lines of Python to make these extra capabilities available to the LLM. And now you can run it locally with an open weights LLM.
This example was originally contributed by Alonso Silva.
"},{"location":"cookbook/read-pdfs/","title":"PDF to structured output with vision language models","text":"A common task with language models is to ask language models questions about a PDF file.
Typically, the output is unstructured text, i.e. \"talking\" to your PDF.
In some cases, you may wish to extract structured information from the PDF, like tables, lists, citations, etc.
PDFs are difficult to machine read. However, you can simply convert the PDF to images, and then use a vision language model to extract structured information from the images.
This cookbook demonstrates how to
- Convert a PDF to a list of images
- Use a vision language model to extract structured information from the images
"},{"location":"cookbook/read-pdfs/#dependencies","title":"Dependencies","text":"You'll need to install these dependencies:
pip install outlines pillow transformers torch==2.4.0 pdf2image\n\n# Optional, but makes the output look nicer\npip install rich\n
"},{"location":"cookbook/read-pdfs/#import-the-necessary-libraries","title":"Import the necessary libraries","text":"from PIL import Image\nimport outlines\nimport torch\nfrom transformers import AutoProcessor\nfrom pydantic import BaseModel\nfrom typing import List, Optional\nfrom pdf2image import convert_from_path\nimport os\nfrom rich import print\nimport requests\n
"},{"location":"cookbook/read-pdfs/#choose-a-model","title":"Choose a model","text":"We've tested this example with Pixtral 12b and Qwen2-VL-7B-Instruct.
To use Pixtral:
from transformers import LlavaForConditionalGeneration\nmodel_name=\"mistral-community/pixtral-12b\"\nmodel_class=LlavaForConditionalGeneration\n
To use Qwen-2-VL:
from transformers import Qwen2VLForConditionalGeneration\nmodel_name = \"Qwen/Qwen2-VL-7B-Instruct\"\nmodel_class = Qwen2VLForConditionalGeneration\n
You can load your model into memory with:
# This loads the model into memory. On your first run,\n# it will have to download the model, which might take a while.\nmodel = outlines.models.transformers_vision(\n model_name,\n model_class=model_class,\n model_kwargs={\n \"device_map\": \"auto\",\n \"torch_dtype\": torch.bfloat16,\n },\n processor_kwargs={\n \"device\": \"auto\",\n },\n)\n
"},{"location":"cookbook/read-pdfs/#convert-the-pdf-to-images","title":"Convert the PDF to images","text":"We'll use the pdf2image
library to convert each page of the PDF to an image.
convert_pdf_to_images
is a convenience function that converts each page of the PDF to an image, and optionally saves the images to disk when output_dir
is provided.
Note: the dpi
argument is important. It controls the resolution of the images. High DPI images are higher quality and may yield better results, but they are also larger, slower to process, and require more memory.
from pdf2image import convert_from_path\nfrom PIL import Image\nimport os\nfrom typing import List, Optional\n\ndef convert_pdf_to_images(\n pdf_path: str,\n output_dir: Optional[str] = None,\n dpi: int = 120,\n fmt: str = 'PNG'\n) -> List[Image.Image]:\n \"\"\"\n Convert a PDF file to a list of PIL Image objects.\n\n Args:\n pdf_path: Path to the PDF file\n output_dir: Optional directory to save the images\n dpi: Resolution for the conversion. High DPI is high quality, but also slow and memory intensive.\n fmt: Output format (PNG recommended for quality)\n\n Returns:\n List of PIL Image objects\n \"\"\"\n # Convert PDF to list of images\n images = convert_from_path(\n pdf_path,\n dpi=dpi,\n fmt=fmt\n )\n\n # Optionally save images\n if output_dir:\n os.makedirs(output_dir, exist_ok=True)\n for i, image in enumerate(images):\n image.save(os.path.join(output_dir, f'page_{i+1}.{fmt.lower()}'))\n\n return images\n
We're going to use the Louf & Willard paper that described the method that Outlines uses for structured generation.
To download the PDF, run:
# Download the PDF file\npdf_url = \"https://arxiv.org/pdf/2307.09702\"\nresponse = requests.get(pdf_url)\n\n# Save the PDF locally\nwith open(\"louf-willard.pdf\", \"wb\") as f:\n f.write(response.content)\n
Now, we can convert the PDF to a list of images:
# Load the pdf\nimages = convert_pdf_to_images(\n \"louf-willard.pdf\",\n dpi=120,\n output_dir=\"output_images\"\n)\n
"},{"location":"cookbook/read-pdfs/#extract-structured-information-from-the-images","title":"Extract structured information from the images","text":"The structured output you can extract is exactly the same as everywhere else in Outlines -- you can use regular expressions, JSON schemas, selecting from a list of options, etc.
"},{"location":"cookbook/read-pdfs/#extracting-data-into-json","title":"Extracting data into JSON","text":"Suppose you wished to go through each page of the PDF, and extract the page description, key takeaways, and page number.
You can do this by defining a JSON schema, and then using outlines.generate.json
to extract the data.
First, define the structure you want to extract:
class PageSummary(BaseModel):\n description: str\n key_takeaways: List[str]\n page_number: int\n
Second, we need to set up the prompt. Adding special tokens can be tricky, so we use the transformers AutoProcessor
to apply the special tokens for us. To do so, we specify a list of messages, where each message is a dictionary with a role
and content
key.
Images are denoted with type: \"image\"
, and text is denoted with type: \"text\"
.
messages = [\n {\n \"role\": \"user\",\n \"content\": [\n # The text you're passing to the model --\n # this is where you do your standard prompting.\n {\"type\": \"text\", \"text\": f\"\"\"\n Describe the page in a way that is easy for a PhD student to understand.\n\n Return the information in the following JSON schema:\n {PageSummary.model_json_schema()}\n\n Here is the page:\n \"\"\"\n },\n\n # Don't need to pass in an image, since we do this\n # when we call the generator function down below.\n {\"type\": \"image\", \"image\": \"\"},\n ],\n }\n]\n\n# Convert the messages to the final prompt\nprocessor = AutoProcessor.from_pretrained(model_name)\ninstruction = processor.apply_chat_template(\n messages, tokenize=False, add_generation_prompt=True\n)\n
Now we iterate through each image, and extract the structured information:
# Page summarizer function\npage_summary_generator = outlines.generate.json(model, PageSummary)\n\nfor image in images:\n result = page_summary_generator(instruction, [image])\n print(result)\n
"},{"location":"cookbook/read-pdfs/#regular-expressions-to-extract-the-arxiv-paper-identifier","title":"Regular expressions to extract the arxiv paper identifier","text":"The arXiv paper identifier is a unique identifier for each paper. These identifiers have the format arXiv:YYMM.NNNNN
(five end digits) or arXiv:YYMM.NNNN
(four end digits). arXiv identifiers are typically watermarked on papers uploaded to arXiv.
arXiv identifiers are optionally followed by a version number, i.e. arXiv:YYMM.NNNNNvX
.
We can use a regular expression to define this patter:
paper_regex = r'arXiv:\\d{2}[01]\\d\\.\\d{4,5}(v\\d)?'\n
We can build an extractor function from the regex:
id_extractor = outlines.generate.regex(model, paper_regex)\n
Now, we can extract the arxiv paper identifier from the first image:
arxiv_instruction = processor.apply_chat_template(\n [\n {\n \"role\": \"user\",\n \"content\": [\n {\"type\": \"text\", \"text\": f\"\"\"\n Extract the arxiv paper identifier from the page.\n\n Here is the page:\n \"\"\"},\n {\"type\": \"image\", \"image\": \"\"},\n ],\n }\n ],\n tokenize=False,\n add_generation_prompt=True\n)\n\n# Extract the arxiv paper identifier\npaper_id = id_extractor(arxiv_instruction, [images[0]])\n
As of the time of this writing, the arxiv paper identifier is
arXiv:2307.09702v4\n
Your version number may be different, but the part before vX
should match.
"},{"location":"cookbook/read-pdfs/#categorize-the-paper-into-one-of-several-categories","title":"Categorize the paper into one of several categories","text":"outlines.generate.choice
allows the model to select one of several options. Suppose we wanted to categorize the paper into being about \"language models\", \"economics\", \"structured generation\", or \"other\".
Let's define a few categories we might be interested in:
categories = [\n \"llms\",\n \"cell biology\",\n \"other\"\n]\n
Now we can construct the prompt:
categorization_instruction = processor.apply_chat_template(\n [\n {\n \"role\": \"user\",\n \"content\": [\n {\"type\": \"text\", \"text\": f\"\"\"\n Please choose one of the following categories\n that best describes the paper.\n\n {categories}\n\n Here is the paper:\n \"\"\"},\n\n {\"type\": \"image\", \"image\": \"\"},\n ],\n }\n ],\n tokenize=False,\n add_generation_prompt=True\n)\n
Now we can show the model the first page and extract the category:
# Build the choice extractor\ncategorizer = outlines.generate.choice(\n model,\n categories\n)\n\n# Categorize the paper\ncategory = categorizer(categorization_instruction, [images[0]])\nprint(category)\n
Which should return:
llms\n
"},{"location":"cookbook/read-pdfs/#additional-notes","title":"Additional notes","text":"You can provide multiple images to the model by
- Adding additional image messages
- Providing a list of images to the
generate
function
For example, to have two images, you can do:
two_image_prompt = processor.apply_chat_template(\n [\n {\n \"role\": \"user\",\n \"content\": [\n {\"type\": \"text\", \"text\": \"are both of these images of hot dogs?\"},\n\n # Tell the model there are two images\n {\"type\": \"image\", \"image\": \"\"},\n {\"type\": \"image\", \"image\": \"\"},\n ],\n }\n ],\n tokenize=False,\n add_generation_prompt=True\n)\n\n# Pass two images to the model\ngenerator = outlines.generate.choice(\n model,\n [\"hot dog\", \"not hot dog\"]\n)\n\nresult = generator(\n two_image_prompt,\n\n # Pass two images to the model\n [images[0], images[1]]\n)\nprint(result)\n
Using the first to pages of the paper (they are not images of hot dogs), we should get
not hot dog\n
"},{"location":"cookbook/receipt-digitization/","title":"Receipt Data Extraction with VLMs","text":""},{"location":"cookbook/receipt-digitization/#setup","title":"Setup","text":"You'll need to install the dependencies:
pip install outlines torch==2.4.0 transformers accelerate pillow rich\n
"},{"location":"cookbook/receipt-digitization/#import-libraries","title":"Import libraries","text":"Load all the necessary libraries:
# LLM stuff\nimport outlines\nimport torch\nfrom transformers import AutoProcessor\nfrom pydantic import BaseModel, Field\nfrom typing import Literal, Optional, List\n\n# Image stuff\nfrom PIL import Image\nimport requests\n\n# Rich for pretty printing\nfrom rich import print\n
"},{"location":"cookbook/receipt-digitization/#choose-a-model","title":"Choose a model","text":"This example has been tested with mistral-community/pixtral-12b
(HF link) and Qwen/Qwen2-VL-7B-Instruct
(HF link).
We recommend Qwen-2-VL as we have found it to be more accurate than Pixtral.
If you want to use Qwen-2-VL, you can do the following:
# To use Qwen-2-VL:\nfrom transformers import Qwen2VLForConditionalGeneration\nmodel_name = \"Qwen/Qwen2-VL-7B-Instruct\"\nmodel_class = Qwen2VLForConditionalGeneration\n
If you want to use Pixtral, you can do the following:
# To use Pixtral:\nfrom transformers import LlavaForConditionalGeneration\nmodel_name=\"mistral-community/pixtral-12b\"\nmodel_class=LlavaForConditionalGeneration\n
"},{"location":"cookbook/receipt-digitization/#load-the-model","title":"Load the model","text":"Load the model into memory:
model = outlines.models.transformers_vision(\n model_name,\n model_class=model_class,\n model_kwargs={\n \"device_map\": \"auto\",\n \"torch_dtype\": torch.bfloat16,\n },\n processor_kwargs={\n \"device\": \"cuda\", # set to \"cpu\" if you don't have a GPU\n },\n)\n
"},{"location":"cookbook/receipt-digitization/#image-processing","title":"Image processing","text":"Images can be quite large. In GPU-poor environments, you may need to resize the image to a smaller size.
Here's a helper function to do that:
def load_and_resize_image(image_path, max_size=1024):\n \"\"\"\n Load and resize an image while maintaining aspect ratio\n\n Args:\n image_path: Path to the image file\n max_size: Maximum dimension (width or height) of the output image\n\n Returns:\n PIL Image: Resized image\n \"\"\"\n image = Image.open(image_path)\n\n # Get current dimensions\n width, height = image.size\n\n # Calculate scaling factor\n scale = min(max_size / width, max_size / height)\n\n # Only resize if image is larger than max_size\n if scale < 1:\n new_width = int(width * scale)\n new_height = int(height * scale)\n image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)\n\n return image\n
You can change the resolution of the image by changing the max_size
argument. Small max sizes will make the image more blurry, but processing will be faster and require less memory.
"},{"location":"cookbook/receipt-digitization/#load-an-image","title":"Load an image","text":"Load an image and resize it. We've provided a sample image of a Trader Joe's receipt, but you can use any image you'd like.
Here's what the image looks like:
# Path to the image\nimage_path = \"https://raw.githubusercontent.com/dottxt-ai/outlines/refs/heads/main/docs/cookbook/images/trader-joes-receipt.jpg\"\n\n# Download the image\nresponse = requests.get(image_path)\nwith open(\"receipt.png\", \"wb\") as f:\n f.write(response.content)\n\n# Load + resize the image\nimage = load_and_resize_image(\"receipt.png\")\n
"},{"location":"cookbook/receipt-digitization/#define-the-output-structure","title":"Define the output structure","text":"We'll define a Pydantic model to describe the data we want to extract from the image.
In our case, we want to extract the following information:
- The store name
- The store address
- The store number
- A list of items, including the name, quantity, price per unit, and total price
- The tax
- The total
- The date
- The payment method
Most fields are optional, as not all receipts contain all information.
class Item(BaseModel):\n name: str\n quantity: Optional[int]\n price_per_unit: Optional[float]\n total_price: Optional[float]\n\nclass ReceiptSummary(BaseModel):\n store_name: str\n store_address: str\n store_number: Optional[int]\n items: List[Item]\n tax: Optional[float]\n total: Optional[float]\n # Date is in the format YYYY-MM-DD. We can apply a regex pattern to ensure it's formatted correctly.\n date: Optional[str] = Field(pattern=r'\\d{4}-\\d{2}-\\d{2}', description=\"Date in the format YYYY-MM-DD\")\n payment_method: Literal[\"cash\", \"credit\", \"debit\", \"check\", \"other\"]\n
"},{"location":"cookbook/receipt-digitization/#prepare-the-prompt","title":"Prepare the prompt","text":"We'll use the AutoProcessor
to convert the image and the text prompt into a format that the model can understand. Practically, this is the code that adds user, system, assistant, and image tokens to the prompt.
# Set up the content you want to send to the model\nmessages = [\n {\n \"role\": \"user\",\n \"content\": [\n {\n # The image is provided as a PIL Image object\n \"type\": \"image\",\n \"image\": image,\n },\n {\n \"type\": \"text\",\n \"text\": f\"\"\"You are an expert at extracting information from receipts.\n Please extract the information from the receipt. Be as detailed as possible --\n missing or misreporting information is a crime.\n\n Return the information in the following JSON schema:\n {ReceiptSummary.model_json_schema()}\n \"\"\"},\n ],\n }\n]\n\n# Convert the messages to the final prompt\nprocessor = AutoProcessor.from_pretrained(model_name)\nprompt = processor.apply_chat_template(\n messages, tokenize=False, add_generation_prompt=True\n)\n
If you are curious, the final prompt that is sent to the model looks (roughly) like this:
<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>\nYou are an expert at extracting information from receipts.\nPlease extract the information from the receipt. Be as detailed as\npossible -- missing or misreporting information is a crime.\n\nReturn the information in the following JSON schema:\n\n<JSON SCHEMA OMITTED>\n<|im_end|>\n<|im_start|>assistant\n
"},{"location":"cookbook/receipt-digitization/#run-the-model","title":"Run the model","text":"# Prepare a function to process receipts\nreceipt_summary_generator = outlines.generate.json(\n model,\n ReceiptSummary,\n\n # Greedy sampling is a good idea for numeric\n # data extraction -- no randomness.\n sampler=outlines.samplers.greedy()\n)\n\n# Generate the receipt summary\nresult = receipt_summary_generator(prompt, [image])\nprint(result)\n
"},{"location":"cookbook/receipt-digitization/#output","title":"Output","text":"The output should look like this:
ReceiptSummary(\n store_name=\"Trader Joe's\",\n store_address='401 Bay Street, San Francisco, CA 94133',\n store_number=0,\n items=[\n Item(name='BANANA EACH', quantity=7, price_per_unit=0.23, total_price=1.61),\n Item(name='BAREBELLS CHOCOLATE DOUG', quantity=1, price_per_unit=2.29, total_price=2.29),\n Item(name='BAREBELLS CREAMY CRISP', quantity=1, price_per_unit=2.29, total_price=2.29),\n Item(name='BAREBELLS CHOCOLATE DOUG', quantity=1, price_per_unit=2.29, total_price=2.29),\n Item(name='BAREBELLS CARAMEL CASHEW', quantity=2, price_per_unit=2.29, total_price=4.58),\n Item(name='BAREBELLS CREAMY CRISP', quantity=1, price_per_unit=2.29, total_price=2.29),\n Item(name='SPINDRIFT ORANGE MANGO 8', quantity=1, price_per_unit=7.49, total_price=7.49),\n Item(name='Bottle Deposit', quantity=8, price_per_unit=0.05, total_price=0.4),\n Item(name='MILK ORGANIC GALLON WHOL', quantity=1, price_per_unit=6.79, total_price=6.79),\n Item(name='CLASSIC GREEK SALAD', quantity=1, price_per_unit=3.49, total_price=3.49),\n Item(name='COBB SALAD', quantity=1, price_per_unit=5.99, total_price=5.99),\n Item(name='PEPPER BELL RED XL EACH', quantity=1, price_per_unit=1.29, total_price=1.29),\n Item(name='BAG FEE.', quantity=1, price_per_unit=0.25, total_price=0.25),\n Item(name='BAG FEE.', quantity=1, price_per_unit=0.25, total_price=0.25)\n ],\n tax=0.68,\n total=41.98,\n date='2023-11-04',\n payment_method='debit',\n\n)\n
Voila! You've successfully extracted information from a receipt using an LLM.
"},{"location":"cookbook/receipt-digitization/#bonus-roasting-the-user-for-their-receipt","title":"Bonus: roasting the user for their receipt","text":"You can roast the user for their receipt by adding a roast
field to the end of the ReceiptSummary
model.
class ReceiptSummary(BaseModel):\n ...\n roast: str\n
which gives you a result like
ReceiptSummary(\n ...\n roast=\"You must be a fan of Trader Joe's because you bought enough\n items to fill a small grocery bag and still had to pay for a bag fee.\n Maybe you should start using reusable bags to save some money and the\n environment.\"\n)\n
Qwen is not particularly funny, but worth a shot.
"},{"location":"cookbook/simtom/","title":"Build perspective-taking agents with SimToM","text":"Prompting strategies like Chain-of-Thought (CoT) can improve LLMs' reasoning capabilities. However, they underwhelm in tasks that require keeping track of inconsistent world states. SimToM proposes a simple, two-stage prompting framework for LLMs inspired by Simulation Theory. The authors showed that this approach outperforms zero-shot prompting and CoT on ToMI and BigToM, two benchmarks with Theory of Mind questions.
In this example, we will implement SimToM with a few lines of code using Outlines' prompt templating and structured generation capabilities.
"},{"location":"cookbook/simtom/#how-simtom-works","title":"How SimToM works","text":"SimToM calls an LLM with two consecutive prompts:
- Perspective-taking: The first prompt receives a
story
and a character
. The goal is to understand the situation based on the character's point of view and filter out the rest of the story. - Question-Answering: The second prompt receives the character's point of view from the previous step and tasks the LLM to answer a question using that context.
"},{"location":"cookbook/simtom/#outlines-implementation","title":"Outlines implementation","text":"To implement SimToM with Outlines, we will need to:
- Write the prompts with prompt functions.
- Define the JSON object each prompt will return using Pydantic.
- Generate responses with a Mistral model using the transformers integration.
Let's dive into it!
"},{"location":"cookbook/simtom/#using-prompt-functions","title":"Using Prompt Functions","text":"With Outlines, you can write your prompts as Python functions by adding the @outlines.prompt
decorator. The prompt template is contained in their docstring, and their arguments correspond to variables used in the prompt.
The authors have shared their code, prompts and data in this GitHub repository. Below, we define in Outlines the prompts they used for the ToMI dataset:
import outlines\n\n\n@outlines.prompt\ndef perspective_taking(story: str, character: str) -> None:\n \"\"\"<s>[INST] The following is a sequence of events about some characters, that takes place in multiple locations.\n Your job is to output only the events that the specified character, {{character}}, knows about.\n\n Here are a few rules:\n 1. A character knows about all events that they do.\n 2. If a character is in a certain room/location, that character knows about all other events that happens in the room. This includes other characters leaving or exiting the location, the locations of objects in that location, and whether somebody moves an object to another place.\n 3. If a character leaves a location, and is NOT in that location, they no longer know about any events that happen within that location. However, they can re-enter the location.\n\n Story: {{story}}\n What events does {{character}} know about? Only output the events according to the above rules, do not provide an explanation. [/INST]\"\"\" # noqa\n\n@outlines.prompt\ndef simulation(events: list, name: str, question: str) -> None:\n \"\"\"<s>[INST] {% for event in events %}\n {{event}}\n {% endfor %}\n You are {{name}}.\n Based on the above information, answer the following question:\n {{question}}\n You must choose one of the above choices, do not say there is not enough information. Answer with a single word, do not output anything else. [/INST]\"\"\" # noqa\n
"},{"location":"cookbook/simtom/#json-structured-generation","title":"JSON Structured Generation","text":"Outlines guarantees that the LLM will return a valid JSON object, which we can specify as a Pydantic model.
We will need two Pydantic models for SimToM, one for each prompt:
from pydantic import BaseModel, Field\nfrom typing import List\n\n\nclass PerspectiveTaking(BaseModel):\n \"\"\"This is for the first prompt.\"\"\"\n character: str = Field(description=\"The character we extract the events for.\")\n events: List[str] = Field(description=\"All events that the character knows about.\")\n\n\nclass Simulation(BaseModel):\n \"\"\"This is for the second prompt.\"\"\"\n answer: str\n
"},{"location":"cookbook/simtom/#calling-an-llm","title":"Calling an LLM","text":"Let's try SimToM with an example from the ToMI dataset:
story = \"\"\"\n1 Aria entered the front_yard.\n2 Aiden entered the front_yard.\n3 The grapefruit is in the green_bucket.\n4 Aria moved the grapefruit to the blue_container.\n5 Aiden exited the front_yard.\n6 Noah entered the playroom.\n\"\"\"\nquestion = \"7 Where was the grapefruit at the beginning?\"\ncharacter = \"Aria\"\n
We load Mistral-7B-Instruct-v0.3
, create the prompt using the template we defined earlier, and generate a structured response. As a reminder, the goal of the first call is to get all the events a character, Aria
, knows about.
# Load an LLM from Hugging Face\nMODEL_NAME = \"mistral-community/Mistral-7B-Instruct-v0.3\"\nmodel = outlines.models.transformers(MODEL_NAME, device=\"cuda\")\n\nperspective_prompt = perspective_taking(story=story, character=character)\n\n# Call Mistral 7B with the first prompt\ngenerator = outlines.generate.json(model, PerspectiveTaking)\nperspective = generator(perspective_prompt)\n\nprint(perspective.model_dump())\n# {'character': 'Aria', 'events': ['1 Aria entered the front_yard.', '3 The grapefruit is in the green_bucket.', '4 Aria moved the grapefruit to the blue_container.']}\n
Not bad! We will now generate the second prompt with those events.
sim_prompt = simulation(events=perspective.events, name=character, question=question)\n\n# Call Mistral 7B with the second prompt\ngenerator = outlines.generate.json(model, Simulation)\nresult = generator(sim_prompt)\n\nprint(result.model_dump())\n# {'answer': 'green_bucket'}\n
And this is it! SimToM could be useful in agentic workflows, where agents must act based on what they know, not all available information. One caveat of SimToM is that the perspective-taking step may remove important information, leading to wrong results. As the authors note in their paper, it can feature as a simple and effective baseline for evaluating LLMs on Theory of Mind reasoning tasks.
"},{"location":"cookbook/structured_generation_workflow/","title":"Structured Generation Workflow: Generating Synthetic Phone Numbers","text":"This is a condensed version of Coding for Structured Generation with LLMs.
For this example we're going to be building an LLM program to generate synthetic data in the form of realistic looking phone numbers for Washington State. Using an LLM for this task is a bit overkill since we could just as easily accomplish this with a tool like Faker, but this example still serves as a useful way to demonstrate a workflow for using structured generation.
"},{"location":"cookbook/structured_generation_workflow/#unstructured-approach","title":"Unstructured approach","text":"Before diving into how to use structure generation for this task let's start with an unstructured example. We begin by loading our model:
import outlines\n\nmodel_name = 'microsoft/Phi-3-mini-4k-instruct'\nmodel = outlines.models.transformers(model_name)\n
Next we need a prompt for this model. Since we're focusing on structured generation, we won't be engaging in any form of \"prompt hacking\" and will be leaving this prompt untouched for the rest of this example.
tokenizer = AutoTokenizer.from_pretrained(model_name)\n\nmessages_phone = [\n {\"role\": \"user\", \"content\": \"\"\"\n Please generate a realistic phone number for Washington State in the following format\n\n (555) 555-5555\n\n \"\"\"}\n]\n\n# This allows us to properly format our prompt for\n# Phi-3 Mini's 'Instruct' interface.\nprompt_phone = tokenizer.apply_chat_template(messages_phone, tokenize=False)\n
With our prompt ready we can now generate 10 example phone numbers
phone_generator_unstruct = outlines.generate.text(model)\nfor _ in range(10):\n print(phone_generator_unstruct(prompt_phone,max_tokens=12))\n
I'd be happy to help you generate a realistic phone\\ I cannot generate a real phone number as I'm just\\ I'm an AI and don't have the ability\\ Sure! Here is a randomly generated phone number in the format\\ Here's a phone number that fits the format for a\\ In Washington State, phone numbers typically have a three-dig\\ Here are a few examples of phone numbers that could be considered\\ I'd be happy to help generate a realistic phone number\\ I'd be happy to help you generate a random phone\\ Based on the format you provided, a realistic phone number for\\
As we can see, none of these outputs are even phone numbers!
Let's see if we can improve this using structured generation.
"},{"location":"cookbook/structured_generation_workflow/#the-structured-generation-workflow","title":"The Structured Generation Workflow","text":"In order to solve this problem we're going to introduce a Structured Generation Workflow outlined in this image:
Let's step through this:
"},{"location":"cookbook/structured_generation_workflow/#real-example","title":"Real example","text":"We start with a real example phone number, in this case for the Seattle Public Library, that we can use to verify the structure we are creating.
phone_number = \"(206) 386-4636\"\n
For a simple example like this, we'll just be using a single phone number, for more complex examples it can be helpful to have more examples.
"},{"location":"cookbook/structured_generation_workflow/#draft-structure","title":"Draft Structure","text":"The next step in the process is for use to define a simple regex that we feel correctly models our real data.
phone_regex_1 = r'\\([0-9]{3}\\) [0-9]{3}-[0-9]{4}'\n
Next we need to validate this regex against our real data.
"},{"location":"cookbook/structured_generation_workflow/#validate-by-matching-examples","title":"Validate by matching examples","text":"Whenever writing non-trivial code with structured generation it is essential that you first validate the code against your real data example(s).
We'll start with a simple method of validation: just checking that our regex matches the data.
import re\nre.match(phone_regex_1, phone_number)\n\n# <re.Match object; span=(0, 14), match='(206) 386-4636'>\n
Now that we have a match, we can move on to generating structured output!
"},{"location":"cookbook/structured_generation_workflow/#generate-structure","title":"Generate Structure","text":"We're ready to see if structured generation can make an improvement over our initial unstructured approach:
phone_generator_v1 = outlines.generate.regex(model, phone_regex_1)\nfor _ in range(10):\n print(phone_generator_v1(prompt_phone))\n
(206) 555-1234\\ (206) 555-1234\\ (206) 555-1234\\ (206) 555-1234\\ (206) 555-1234\\ (206) 555-1234\\ (206) 123-4567\\ (206) 555-1234\\ (206) 555-1234\\ (206) 555-1234
At least we have phone numbers! But I think we can do better!
"},{"location":"cookbook/structured_generation_workflow/#inspect-output","title":"Inspect output","text":"In this case the model did create phone numbers and, impressively, got the area code correct. So using structured generation did improve things. However these numbers are pretty boring. Let's improve that structure!
"},{"location":"cookbook/structured_generation_workflow/#iteration","title":"Iteration","text":"We've walked through the loop once, so we can go quickly now through each iteration.
We start by improving our structure:
phone_regex_2 = r'\\([0-9]{3}\\) [2-46-9]{3}-[02-9]{4}'\n
Before rushing to another round of generation, let's validate this new regex. We'll add just a bit more sophistication over our last check:
re.match(phone_regex_2, phone_number)[0] == phone_number\n# True\n
Now that we've validated, let's generate with this new regex! phone_generator_v2 = outlines.generate.regex(model,\n phone_regex_2)\nfor _ in range(10):\n print(phone_generator_v2(prompt_phone))\n
(206) 867-5309\\ (206) 666-7777\\ (206) 444-3333\\ (206) 444-3333\\ (206) 943-2222\\ (206) 323-6789\\ (206) 444-3333\\ (206) 867-5309\\ (206) 466-2255\\ (206) 222-3333
Better, but I don't like those repeated sequences. Like good software developers, let's iterate again!
"},{"location":"cookbook/structured_generation_workflow/#reiteration-with-debugging","title":"Reiteration - with debugging","text":"Here's a fancier regex that should give us more interesting results:
phone_regex_3_error = r'\\([0-9]{3}\\) [2-4][7-9][4-6]-[3-6][2-8][1-4]'\n
This looks good to me, but there's a subtle bug, that's why we always need to validate our structure against real data. This time we'll make our validator do a bit more work to verify the correct string is matched:
if not re.match(phone_regex_3_error, phone_number):\n print(\"Regex fails match\")\nelse:\n matched_string = re.match(phone_regex_3_error, phone_number)[0]\n if matched_string == phone_number:\n print(\"Successful match\")\n else:\n print(f\"Error {matched_string} != {phone_number}\")\n
This prints out: Error (206) 386-463 != (206) 386-4636
Ah! We were missing the last digit, let's fix that and regenerate:
phone_regex_3_fixed = r'\\([0-9]{3}\\) [2-4][7-9][4-6]-[3-6][2-8][1-4][6-9]'\nphone_generator_v3 = outlines.generate.regex(model,\n phone_regex_3_fixed)\nfor _ in range(10):\n print(phone_generator_v3(prompt_phone))\n
(206) 494-3216\\ (206) 374-6218\\ (206) 494-3337\\ (206) 476-3216\\ (206) 484-3548\\ (206) 495-3218\\ (206) 494-5517\\ (206) 375-4636\\ (206) 384-6216\\ (206) 385-6218
Much better!
Now you've seen a quick example of the structured generation workflow that can be used at the basis for building and iteration on much larger structured generation tasks!
"},{"location":"reference/","title":"Reference","text":""},{"location":"reference/#structured-generation","title":"Structured generation","text":"While LLM capabilities are increasingly impressive, we can make their output more reliable by steering the generation. Outlines thus offers mechanisms to specify high level constraints on text completions by generative language models.
Stopping sequence By default, language models stop generating tokens after and token was generated, or after a set maximum number of tokens. Their output can be verbose, and for practical purposes it is often necessary to stop the generation after a given sequence has been found instead. You can use the stop_at keyword argument when calling the model with a prompt:
import outlines.models as models\n\ncomplete = models.openai(\"gpt-4o-mini\")\nexpert = complete(\"Name an expert in quantum gravity.\", stop_at=[\"\\n\", \".\"])\n
"},{"location":"reference/functions/","title":"Outlines functions","text":""},{"location":"reference/prompting/","title":"Prompt templating","text":"Outlines provides a powerful domain-specific language to write and manage prompts, via what we call prompt functions. Prompt functions are Python functions that contain a template for the prompt in their docstring, and their arguments correspond to the variables used in the prompt. When called, a prompt function returns the template rendered with the values of the arguments.
The aim of prompt functions is to solve several recurrent problems with prompting:
- Building complex prompts quickly leads to messy code. This problem has already been solved in the web development community by using templating, so why not use it here?
- Composing prompts is difficult. Why not just compose functions?
- Separating prompts from code. Encapsulation in functions allows a clean separation between prompts and code. Moreover, like any function, prompt functions can be imported from other modules.
Outlines uses the Jinja templating engine to render prompts, which allows to easily compose complex prompts.
Prompt rendering
Prompt functions are opinionated when it comes to prompt rendering. These opinions are meant to avoid common prompting errors, but can have unintended consequences if you are doing something unusual. We advise to always print the prompt before using it. You can also read the reference section if you want to know more.
"},{"location":"reference/prompting/#your-first-prompt","title":"Your first prompt","text":"The following snippet showcases a very simple prompt. The variables between curly brackets {{ }}
are placeholders for the values of the arguments you will pass to the prompt function.
CodeOutput import outlines\n\n@outlines.prompt\ndef greetings(name, question):\n \"\"\"Hello, {{ name }}!\n {{ question }}\n \"\"\"\n\nprompt = greetings(\"user\", \"How are you?\")\nprint(prompt)\n
Hello, user!\nHow are you?\n
If a variable is missing in the function's arguments, Jinja2 will throw an UndefinedError
exception:
CodeOutput import outlines\n\n@outlines.prompt\ndef greetings(name):\n \"\"\"Hello, {{ surname }}!\"\"\"\n\nprompt = greetings(\"user\")\n
Traceback (most recent call last):\n File \"<stdin>\", line 9, in <module>\n File \"/home/remi/projects/normal/outlines/outlines/prompts.py\", line 38, in __call__\n return render(self.template, **bound_arguments.arguments)\n File \"/home/remi/projects/normal/outlines/outlines/prompts.py\", line 213, in render\n return jinja_template.render(**values)\n File \"/home/remi/micromamba/envs/outlines/lib/python3.9/site-packages/jinja2/environment.py\", line 1301, in render\n self.environment.handle_exception()\n File \"/home/remi/micromamba/envs/outlines/lib/python3.9/site-packages/jinja2/environment.py\", line 936, in handle_exception\n raise rewrite_traceback_stack(source=source)\n File \"<template>\", line 1, in top-level template code\n jinja2.exceptions.UndefinedError: 'surname' is undefined\n
"},{"location":"reference/prompting/#importing-prompt-functions","title":"Importing prompt functions","text":"Prompt functions are functions, and thus can be imported from other modules:
prompts.pygenerate.pyOutput import outlines\n\n@outlines.prompt\ndef greetings(name, question):\n \"\"\"Hello, {{ name }}!\n {{ question }}\n \"\"\"\n
from .prompts import greetings\n\nprompt = greetings(\"John Doe\", \"How are you today?\")\n
Hello, John Doe!\nHow are you today?\n
"},{"location":"reference/prompting/#few-shot-prompting","title":"Few-shot prompting","text":"Few-shot prompting can lead to messy code. Prompt functions allow you to loop over lists or dictionaries from the template. In the following example we demonstrate how we can generate a prompt by passing a list of dictionaries with keys question
and answer
to the prompt function:
CodeOutput import outlines\n\n@outlines.prompt\ndef few_shots(instructions, examples, question):\n \"\"\"{{ instructions }}\n\n Examples\n --------\n\n {% for example in examples %}\n Q: {{ example.question }}\n A: {{ example.answer }}\n\n {% endfor %}\n Question\n --------\n\n Q: {{ question }}\n A:\n \"\"\"\n\ninstructions = \"Please answer the following question following the examples\"\nexamples = [\n {\"question\": \"2+2=?\", \"answer\":4},\n {\"question\": \"3+3=?\", \"answer\":6}\n]\nquestion = \"4+4 = ?\"\n\nprompt = few_shots(instructions, examples, question)\nprint(prompt)\n
Please answer the following question following the examples\n\nExamples\n--------\n\nQ: 2+2=?\nA: 4\n\nQ: 3+3=?\nA: 6\n\nQuestion\n--------\n\nQ: 4+4 = ?\nA:\n
"},{"location":"reference/prompting/#conditionals-filters-etc","title":"Conditionals, filters, etc.","text":"Jinja2 has many features beyond looping that are not described here: conditionals, filtering, formatting, etc. Please refer to the Jinja documentation for more information about the syntax of the templating language. The Jinja syntax is powerful, and we recommend you take some time to read their documentation if you are building complex prompts.
"},{"location":"reference/prompting/#tools","title":"Tools","text":"Several projects (e.g.Toolformer, ViperGPT, AutoGPT, etc.) have shown that we can \"teach\" language models to use external functions by describing what these functions do in the prompt. In these projects the same information is often repeated twice: the function implementation, name, docstring, or arguments are copy-pasted in the prompt. This is cumbersome and error prone; you can directly pull this information from within an Outlines prompt function:
CodeOutput import outlines\n\ndef my_tool(arg1: str, arg2: int):\n \"\"\"Tool description.\n\n The rest of the docstring\n \"\"\"\n pass\n\n@outlines.prompt\ndef tool_prompt(question, tool):\n \"\"\"{{ question }}\n\n COMMANDS\n 1. {{ tool | name }}: {{ tool | description }}, args: {{ tool | args }}\n\n {{ tool | source }}\n \"\"\"\n\nprompt = tool_prompt(\"Can you do something?\", my_tool)\nprint(prompt)\n
Can you do something?\n\nCOMMANDS\n1. my_tool: Tool description., args: arg1: str, arg2: int\n\ndef my_tool(arg1: str, arg2: int):\n \"\"\"Tool description.\n\n The rest of the docstring\n \"\"\"\n pass\n
"},{"location":"reference/prompting/#json-response-format","title":"JSON response format","text":"To build reliable chains with language models we often need to instruct them the format in which we would like them to return their response.
Without prompt templating, the information is repeated twice between creating the parsing function (e.g. a Pydantic model), and writing the desired schema in the prompt. This can lead to errors that are hard to debug.
Outlines allows you to directly pull the JSON schema of a pydantic model, or pretty print a dictionary from within an Outlines prompt function
CodeOutput from pydantic import BaseModel, Field\n\nimport outlines\n\nclass MyResponse(BaseModel):\n field1: int = Field(description=\"an int\")\n field2: str\n\n@outlines.prompt\ndef my_prompt(response_model):\n \"\"\"{{ response_model | schema }}\"\"\"\n\nprompt = my_prompt(MyResponse)\nprint(prompt)\n# {\n# \"field1\": \"an int\",\n# \"field2\": \"<field2>\"\n# }\n
response = {\n \"field1\": \"<field1>\",\n \"field2\": \"a string\"\n}\n\nmy_prompt(MyResponse)\n# {\n# \"field1\": \"<field1>\",\n# \"field2\": \"a string\"\n# }\n
"},{"location":"reference/prompting/#formatting-conventions","title":"Formatting conventions","text":"Prompt functions are opinionated when it comes to rendering, and these opinions are meant to avoid prompting mistakes and help with formatting.
"},{"location":"reference/prompting/#whitespaces","title":"Whitespaces","text":"If you have experience working with strings between triple quotes you know that indenting has an influence on the string's formatting. Prompt functions adopt a few conventions so you don't have to think about indents when writing prompt.
First, whether you start the prompt right after the triple quotes or on the line below does not matter for formatting:
CodeOutput import outlines\n\n@outlines.prompt\ndef prompt1():\n \"\"\"My prompt\n \"\"\"\n\n@outlines.prompt\ndef prompt2():\n \"\"\"\n My prompt\n \"\"\"\n\nprint(prompt1())\nprint(prompt2())\n
My prompt\nMy prompt\n
Indentation is relative to the second line of the docstring, and leading spaces are removed:
CodeOutput import outlines\n\n@outlines.prompt\ndef example1():\n \"\"\"First line\n Second line\n \"\"\"\n\n@outlines.prompt\ndef example2():\n \"\"\"\n Second line\n Third line\n \"\"\"\n\n@outlines.prompt\ndef example3():\n \"\"\"\n Second line\n Third line\n \"\"\"\n\nprint(example1())\nprint(example2())\nprint(example3())\n
First line\nSecond line\n\nSecond line\nThird line\n\nSecond line\n Third line\n
Trailing whitespaces are not removed, unless they follow a linebreak symbol \\
(see linebreaks).
"},{"location":"reference/prompting/#linebreaks","title":"Linebreaks","text":"You can use the backslash \\
to break a long line of text. It will render as a single line:
CodeOutput import outlines\n\n@outlines.prompt\ndef example():\n \"\"\"\n Break in \\\n several lines \\\n But respect the indentation\n on line breaks.\n And after everything \\\n Goes back to normal\n \"\"\"\n\nprint(example())\n
Break in several lines But respect the indentation\n on line breaks.\nAnd after everything Goes back to normal\n
"},{"location":"reference/samplers/","title":"Samplers","text":"Outlines offers different sequence sampling algorithms, and we will integrate more in the future. You can read this blog post for an overview of the different sampling algorithm.
Samplers provide control over the sampling process, allowing you to influence the output of the model. This can include controlling randomness (temperature), biasing towards certain tokens (top-k, top-p), or sequence generation (beam search).
"},{"location":"reference/samplers/#multinomial-sampling","title":"Multinomial sampling","text":"Multinomial sampling is the default sampling algorithm in Outlines.
As an example, suppose we have only two possible tokens: \"H\" and \"T\". For a fixed prompt such as \"Flip a coin, did you get heads or tails?\" The language model calculates probability for each token:
Token Probability \"H\" 0.5 \"T\" 0.5 You'd expect to receive \"H\" 50% of the time and \"T\" 50% of the time.
"},{"location":"reference/samplers/#parameters","title":"Parameters","text":" samples
: Number of samples to generate (default: 1) top_k
: Only consider the top k tokens (optional) top_p
: Only consider the top tokens with cumulative probability >= p (optional) temperature
: Controls randomness of sampling (optional)
"},{"location":"reference/samplers/#default-behavior","title":"Default behavior","text":"Outlines defaults to the multinomial sampler without top-p or top-k sampling, and temperature equal to 1.
Not specifying a sampler is equivalent to:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.multinomial()\n\ngenerator = generate.text(model, sampler)\nanswer = generator(\"What is 2+2?\")\n\nprint(answer)\n# 4\n
"},{"location":"reference/samplers/#batching","title":"Batching","text":"You can ask the generator to take multiple samples by passing the number of samples when initializing the sampler:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.multinomial(3)\n\ngenerator = generate.text(model, sampler)\nanswer = generator(\"What is 2+2?\")\n\nprint(answer)\n# [4, 4, 4]\n
If you ask multiple samples for a batch of prompts the returned array will be of shape (num_samples, num_batches)
:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.multinomial(3)\n\ngenerator = generate.text(model, sampler)\nanswer = generator([\"What is 2+2?\", \"What is 3+3?\"])\n\nprint(answer)\n# [[4, 4, 4], [6, 6, 6]]\n
"},{"location":"reference/samplers/#temperature","title":"Temperature","text":"You can control the temperature with
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.multinomial(3, temperature=0.5)\n\ngenerator = generate.text(model, sampler)\nanswer = generator([\"What is 2+2?\", \"What is 3+3?\"])\n\nprint(answer)\n
If you would like to use temperature=0.0
, please use sampler=samplers.greedy()
instead.
"},{"location":"reference/samplers/#top-k-sampling","title":"Top-k sampling","text":"You can ask Outlines to only consider the top-k logits at each step by specifying the value of the top-k
keyword argument when initializing the sampler.
sampler = samplers.multinomial(3, top_k=10)\n
"},{"location":"reference/samplers/#top-p-sampling","title":"Top-p sampling","text":"You can ask Outlines to only consider the highest probability tokens such that their cumulative probability is greater than a threshold p
. Specify the top_p
keyword argument when initializing the sampler:
sampler = samplers.multinomial(3, top_p=0.95)\n
"},{"location":"reference/samplers/#greedy-sampler","title":"Greedy sampler","text":"Greedy sampling selects the token with the highest probability at each step. It's deterministic and always produces the same output for a given input.
To use the greedy sampler, initialize the generator with the sampler:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.greedy()\n\ngenerator = generate.text(model, sampler)\nanswer = generator(\"What is 2+2?\")\n\nprint(answer)\n# 4\n
You cannot ask for multiple samples with the greedy sampler since it does not clear what the result should be. Only the most likely token can be returned.
"},{"location":"reference/samplers/#beam-search","title":"Beam Search","text":"Beam search maintains multiple candidate sequences at each step, potentially finding better overall sequences than greedy or multinomial sampling.
To use Beam Search, initialize the generator with the sampler:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.beam_search(beams=5)\n\ngenerator = generate.text(model, sampler)\nanswer = generator(\"What is 2+2?\")\n\nprint(answer)\n# 4\n
Compatibility
Only models from the transformers
and exllamav2
libraries are compatible with Beam Search.
"},{"location":"reference/samplers/#parameters_1","title":"Parameters","text":" beams
: Number of beams to use (default: 1)
"},{"location":"reference/samplers/#sampler-comparison","title":"Sampler Comparison","text":"Here's a table comparing the different samplers:
Sampler Pros Cons Use Cases Greedy Deterministic, fast May produce repetitive text When you need consistent, predictable output Multinomial Balances exploration and exploitation Results may vary between runs General-purpose text generation, creative tasks Beam Search Can find globally better sequences More computationally expensive When sequence quality is critical, e.g., translation For most use cases, we recommend using the default multinomial sampler.
"},{"location":"reference/text/","title":"Text generation","text":"Outlines provides a unified interface to generate text with many language models, API-based and local. The same pattern is used throughout the library:
- Instantiate a generator by calling
outlines.generate.text
with the model to be used. - Call the generator with the prompt and (optionally) some generation parameters.
from outlines import models, generate\n\nmodel = models.openai(\"gpt-4o-mini\")\ngenerator = generate.text(model)\nanswer = generator(\"What is 2+2?\")\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\nanswer = generator(\"What is 2+2?\")\n
By default Outlines uses the multinomial sampler with temperature=1
. See this section to learn how to use different samplers.
"},{"location":"reference/text/#streaming","title":"Streaming","text":"Outlines allows you to stream the model's response by calling the .stream
method of the generator with the prompt:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\n\ntokens = generator.stream(\"What is 2+2?\")\nfor token in tokens:\n print(token)\n
"},{"location":"reference/text/#parameters","title":"Parameters","text":""},{"location":"reference/text/#limit-the-number-of-tokens-generated","title":"Limit the number of tokens generated","text":"To limit the number of tokens generated you can pass the max_tokens
positional argument to the generator:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\n\nanswer = generator(\"What is 2+2?\", 5)\nanswer = generator(\"What is 2+2?\", max_tokens=5)\n
"},{"location":"reference/text/#stop-after-a-given-string-is-generated","title":"Stop after a given string is generated","text":"You can also ask the model to stop generating text after a given string has been generated, for instance a period or a line break. You can pass a string or a line of string for the stop_at
argument:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\n\nanswer = generator(\"What is 2+2?\", stop_at=\".\")\nanswer = generator(\"What is 2+2?\", stop_at=[\".\", \"\\n\"])\n
The stopping string will be included in the response.
"},{"location":"reference/text/#seed-the-generation","title":"Seed the generation","text":"It can be useful to seed the generation in order to get reproducible results:
import torch\nfrom outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n\nseed = 789001\n\nanswer = generator(\"What is 2+2?\", seed=seed)\n
"},{"location":"reference/generation/cfg/","title":"Grammar-structured generation","text":"You can pass any context-free grammar in the EBNF format and Outlines will generate an output that is valid to this grammar:
from outlines import models, generate\n\narithmetic_grammar = \"\"\"\n ?start: expression\n\n ?expression: term ((\"+\" | \"-\") term)*\n\n ?term: factor ((\"*\" | \"/\") factor)*\n\n ?factor: NUMBER\n | \"-\" factor\n | \"(\" expression \")\"\n\n %import common.NUMBER\n\"\"\"\n\nmodel = models.transformers(\"WizardLM/WizardMath-7B-V1.1\")\ngenerator = generate.cfg(model, arithmetic_grammar)\nsequence = generator(\n \"Alice had 4 apples and Bob ate 2. \"\n + \"Write an expression for Alice's apples:\"\n)\n\nprint(sequence)\n# (8-2)\n
"},{"location":"reference/generation/cfg/#disclaimer","title":"Disclaimer","text":"Experimental
Outlines current community-contributed implementation of CFG-structured generation is experimental. This does not reflect the performance of .txt's product, where we have optimized grammar-structured generation to be as fast as regex-structured generation. Additionally, it does not fully align with the approach described in our technical report, aside from its use of incremental/partial parsing. This feature is still a work in progress, requiring performance enhancements and bug fixes for an ideal implementation. For more details, please see our grammar-related open issues on GitHub.
Greedy
To mitigate performance issues, CFG-structured generation will use rejection sampling and iterate over the candidate tokens highest logit first,, completing once a single valid token ID is selected. This is effectively greedy generation.
"},{"location":"reference/generation/cfg/#ready-to-use-grammars","title":"Ready-to-use grammars","text":"Outlines contains a (small) library of grammars that can be imported and use directly. We can rewrite the previous example as:
from outlines import models, generate\n\narithmetic_grammar = outlines.grammars.arithmetic\n\nmodel = models.transformers(\"WizardLM/WizardMath-7B-V1.1\")\ngenerator = generate.cfg(model, arithmetic_grammar)\nsequence = generator(\n \"Alice had 4 apples and Bob ate 2. \"\n + \"Write an expression for Alice's apples:\"\n)\n\nprint(sequence)\n# (8-2)\n
The following grammars are currently available:
- Arithmetic grammar via
outlines.grammars.arithmetic
- JSON grammar via
outlines.grammars.json
If you would like more grammars to be added to the repository, please open an issue or a pull request.
"},{"location":"reference/generation/cfg/#grammar-guide","title":"Grammar guide","text":"A grammar is a list of rules and terminals that define a language:
- Terminals define the vocabulary of the language; they may be a string, regular expression or combination of these and other terminals.
- Rules define the structure of that language; they are a list of terminals and rules.
Outlines uses the Lark library to make Large Language Models generate text in a language of a grammar, it thus uses grammars defined in a format that Lark understands, based on the EBNF syntax. Read the Lark documentation for more details on grammar, the following is a small primer that should help get your started.
In the following we will define a LOGO-like toy language for python's turtle library.
"},{"location":"reference/generation/cfg/#terminals","title":"Terminals","text":"A turtle can take 4 different MOVEMENT
move instructions: forward (f
), backward (b
), turn right (r
) and turn left (l
). It can take NUMBER
number of steps in each direction, and draw lines in a specified COLOR
. These define the vocabulary of our language:
MOVEMENT: \"f\"|\"b\"|\"r\"|\"l\"\nCOLOR: LETTER+\n\n%import common.LETTER\n%import common.INT -> NUMBER\n%import common.WS\n%ignore WS\n
The lines that start with %
are called \"directive\". They allow to import pre-defined terminals and rules, such as LETTER
and NUMBER
. LETTER+
is a regular expressions, and indicates that a COLOR
is made of at least one LETTER
. The last two lines specify that we will ignore white spaces (WS
) in the grammar.
"},{"location":"reference/generation/cfg/#rules","title":"Rules","text":"We now need to define our rules, by decomposing instructions we can send to the turtle via our python program. At each line of the program, we can either choose a direction and execute a given number of steps, change the color used to draw the pattern. We can also choose to start filling, make a series of moves, and stop filling. We can also choose to repeat a series of move.
We can easily write the first two rules:
instruction: MOVEMENT NUMBER -> movement\n | \"c\" COLOR [COLOR] -> change_color\n
where movement
and change_color
represent aliases for the rules. A whitespace implied concatenating the elements, and |
choosing either of the elements. The fill
and repeat
rules are slightly more complex, since they apply to a code block, which is made of instructions. We thus define a new code_block
rule that refers to instruction
and finish implementing our rules:
instruction: MOVEMENT NUMBER -> movement\n | \"c\" COLOR [COLOR] -> change_color\n | \"fill\" code_block -> fill\n | \"repeat\" NUMBER code_block -> repeat\n\ncode_block: \"{\" instruction \"}\"\n
We can now write the full grammar:
start: instruction+\n\ninstruction: MOVEMENT NUMBER -> movement\n | \"c\" COLOR [COLOR] -> change_color\n | \"fill\" code_block -> fill\n | \"repeat\" NUMBER code_block -> repeat\n\ncode_block: \"{\" instruction+ \"}\"\n\nMOVEMENT: \"f\"|\"b\"|\"l\"|\"r\"\nCOLOR: LETTER+\n\n%import common.LETTER\n%import common.INT -> NUMBER\n%import common.WS\n%ignore WS\n
Notice the start
rule, which defines the starting point of the grammar, i.e. the rule with which a program must start. This full grammars allows us to parse programs such as:
c red yellow\n fill { repeat 36 {\n f200 l170\n }}\n
The result of the parse, the parse tree, can then easily be translated into a Python program that uses the turtle
library to draw a pattern.
"},{"location":"reference/generation/cfg/#next-steps","title":"Next steps","text":"This section provides a very brief overview of grammars and their possibilities. Check out the Lark documentation for more thorough explanations and more examples.
"},{"location":"reference/generation/choices/","title":"Multiple choices","text":"Oultines allows you to make sure the generated text is chosen between different options:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.choice(model, [\"skirt\", \"dress\", \"pen\", \"jacket\"])\nanswer = generator(\"Pick the odd word out: skirt, dress, pen, jacket\")\n
Performance
generation.choice
computes an index that helps Outlines guide generation. This can take some time, but only needs to be done once. If you want to generate from the same list of choices several times make sure that you only call generate.choice
once.
"},{"location":"reference/generation/creating_grammars/","title":"Overview","text":"Outlines allows the use of Lark grammars to guide generation. These grammars are used to construct parsers that filter out incompatible tokens during the generation process The result is a generation that adheres to the grammar's production rules.
"},{"location":"reference/generation/creating_grammars/#primer-on-creating-grammars","title":"Primer on Creating Grammars","text":"To create grammars for Outlines, a solid understanding of Lark grammars is necessary. Here's how you can get started:
- Read Lark's grammars documentations here.
- Review Outlines' existing grammars here.
"},{"location":"reference/generation/creating_grammars/#compatibility-with-outlines","title":"Compatibility With Outlines","text":"It's important to note that not all Lark grammars work with Outlines. Changes may be necessary to ensure compatability.
"},{"location":"reference/generation/creating_grammars/#lalr1-parser","title":"LALR(1) Parser","text":"Outlines utilizes Larks LALR(1) parser, meaning the grammar must be unambiguous at least up to the next token (one token lookahead). Read Lark's official LALR(1) parser documentation here.
If your grammar is ambiguous, you will recieve the following error at runtime:
GrammarError: Reduce/Reduce collision in Terminal('B') between the following rules:\n
"},{"location":"reference/generation/creating_grammars/#regex-terminal-restrictions","title":"Regex Terminal Restrictions","text":"Outlines converts terminals to finite state machines using the Interegular library. Not all regular expressions work with Interegular, mitigation is described in the subsections which follow.
"},{"location":"reference/generation/creating_grammars/#avoid-lookarounds","title":"Avoid Lookarounds","text":"Examples of removing lookaround while maintaining the same functionality
"},{"location":"reference/generation/creating_grammars/#example-escaped-string","title":"Example: Escaped String","text":"From Outlines' modified ESCAPED_STRING
in common.lark.
Before:
_STRING_INNER: /.*?/\n_STRING_ESC_INNER: _STRING_INNER /(?<!\\\\)(\\\\\\\\)*?/\n\nESCAPED_STRING : \"\\\"\" _STRING_ESC_INNER \"\\\"\"\n
After:
_NON_CONTROL_CHAR: /([^\"\\\\\\x00-\\x1F\\x7F-\\x9F])/\n_ESCAPED_CHAR: /\\\\/ (_NON_CONTROL_CHAR | /\\\\/ | /\"/)\nESCAPED_STRING_INNER: _NON_CONTROL_CHAR | _ESCAPED_CHAR\nESCAPED_STRING: /\"/ ESCAPED_STRING_INNER* /\"/\n
"},{"location":"reference/generation/creating_grammars/#avoid-backreferences","title":"Avoid Backreferences","text":"Backreferences, for example ([ab]^*)\\1
, cannot be simulated by a finite state machine, and will result in an error if used.
"},{"location":"reference/generation/creating_grammars/#creating-a-valid-grammar","title":"Creating a Valid Grammar","text":"You can use Outlines' test suite to verify your grammar.
"},{"location":"reference/generation/creating_grammars/#1-create-your-grammar","title":"1) Create Your Grammar","text":"Create your grammar file named your_new_grammar.lark
, adhering to the guidelines provided above. Add it to outlines/grammars/
(ensure attribution is included and license is compatible).
Update outlines/grammars.py
with a line including your grammar.
"},{"location":"reference/generation/creating_grammars/#2-test-your-grammar","title":"2) Test Your Grammar","text":"Test grammar for false negatives, ensure sample grammars can be generated: - Add valid example outputs which are compliant with the grammar to tests/benchmark/cfg_samples/your_new_grammar/
- Run the tests for your grammar via pytest -s tests/fsm/test_cfg_guide.py::test_cfg_grammar_sample -k \"your_new_grammar\"
Test grammar for false positives, ensure invalid outputs aren't generated.
Currently there isn't a builtin false positive testing utility. It is recommended you smoke test via
from outlines import models, generate, grammars\nmodel = models.transformers(\"mistralai/Mistral-7B-v0.1\")\ngenerator = generate.cfg(model, grammars.your_new_grammar)\nresult = generator(<your prompt to generate output for your grammar>)\nprint(result)\n
"},{"location":"reference/generation/creating_grammars/#converting","title":"Converting","text":"There are a few tools available for converting from other grammars to lark. These tools serve as a starting point. However, you will typically need to make additional adjustments to ensure full compatibility and proper functioning within Outlines.
Tools: - Larks built in \"Nearley-to-Lark\" converter https://lark-parser.readthedocs.io/en/latest/tools.html - Convert ANTLR4 to Lark (Note, most antlr4 grammars are not LALR(1) compatible, so will require additional tweaking) https://github.com/kaby76/Domemtech.Trash/blob/main/src/trconvert/readme.md - Extract EBNF from Yacc files https://www.bottlecaps.de/rr/ui
Reference Grammars: - Github Lark Grammars https://github.com/search?q=path%3A.lark&type=code - Github Nearley Grammars https://github.com/search?q=path%3A.ne+%22-%3E%22&type=code - Antlr4 grammars https://github.com/antlr/grammars-v4/ - Grammar zoo https://slebok.github.io/zoo/index.html#html
"},{"location":"reference/generation/custom_fsm_ops/","title":"Custom FSM Operations","text":"Outlines is fast because it compiles regular expressions into an index ahead of inference. To do so we use the equivalence between regular expressions and Finite State Machines (FSMs), and the library interegular to perform the translation.
Alternatively, one can pass a FSM built using integular
directly to structure the generation.
"},{"location":"reference/generation/custom_fsm_ops/#example","title":"Example","text":""},{"location":"reference/generation/custom_fsm_ops/#using-the-difference-operation","title":"Using the difference
operation","text":"In the following example we build a fsm which recognizes only the strings valid to the first regular expression but not the second. In particular, it will prevent the words \"pink\" and \"elephant\" from being generated:
import interegular\nfrom outlines import models, generate\n\n\nlist_of_strings_pattern = \"\"\"\\[\"[^\"\\s]*\"(?:,\"[^\"\\s]*\")*\\]\"\"\"\npink_elephant_pattern = \"\"\".*(pink|elephant).*\"\"\"\n\nlist_of_strings_fsm = interegular.parse_pattern(list_of_strings_pattern).to_fsm()\npink_elephant_fsm = interegular.parse_pattern(pink_elephant_pattern).to_fsm()\n\ndifference_fsm = list_of_strings_fsm - pink_elephant_fsm\n\ndifference_fsm_fsm.accepts('[\"a\",\"pink\",\"elephant\"]')\n# False\ndifference_fsm_fsm.accepts('[\"a\",\"blue\",\"donkey\"]')\n# True\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.fsm(model, difference_fsm)\nresponse = generator(\"Don't talk about pink elephants\")\n
To see the other operations available, consult interegular's documentation.
"},{"location":"reference/generation/format/","title":"Type constraints","text":"We can ask completions to be restricted to valid python types:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.format(model, int)\nanswer = generator(\"When I was 6 my sister was half my age. Now I\u2019m 70 how old is my sister?\")\nprint(answer)\n# 67\n
The following types are currently available:
- int
- float
- bool
- datetime.date
- datetime.time
- datetime.datetime
- We also provide custom types
"},{"location":"reference/generation/generation/","title":"Generation","text":"Once an Outlines model is constructed you can use outlines.generate
to generate text. Standard LLM generation is possible via outlines.generate.text
, along with a variety of structured generation methods described below. (For a detailed technical explanation of how structured generation works, you may review the Structured Generation Explanation page)
Before generating text, you must construct an outlines.model
. Example:
import outlines\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-4k-instruct\", device=\"cuda\")\n
"},{"location":"reference/generation/generation/#text-generator","title":"Text generator","text":"generator = outlines.generate.text(model)\n\nresult = generator(\"Question: What's 2+2? Answer:\", max_tokens=100)\nprint(result)\n# The answer is 4\n\n# Outlines also supports streaming output\nstream = generator.stream(\"What's 2+2?\", max_tokens=4)\nfor i in range(5):\n token = next(stream)\n print(repr(token))\n# '2'\n# '+'\n# '2'\n# ' equals'\n# '4'\n
"},{"location":"reference/generation/generation/#multi-label-classification","title":"Multi-label classification","text":"Outlines allows you to do multi-label classification by guiding the model so it can only output either of the specified choices:
import outlines\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = outlines.generate.choice(model, [\"Blue\", \"Red\", \"Yellow\"])\n\ncolor = generator(\"What is the closest color to Indigo? \")\nprint(color)\n# Blue\n
"},{"location":"reference/generation/generation/#json-structured-generation","title":"JSON-structured generation","text":"Outlines can guide models so that they output valid JSON 100% of the time. You can either specify the structure using Pydantic or a string that contains a JSON Schema:
PydanticJSON Schema from enum import Enum\nfrom pydantic import BaseModel, constr, conint\n\nimport outlines\n\nclass Armor(str, Enum):\n leather = \"leather\"\n chainmail = \"chainmail\"\n plate = \"plate\"\n\n\nclass Character(BaseModel):\n name: constr(max_length=10)\n age: conint(gt=18, lt=99)\n armor: Armor\n strength: conint(gt=1, lt=100)\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = outlines.generate.json(model, Character)\n\ncharacter = generator(\n \"Generate a new character for my awesome game: \"\n + \"name, age (between 1 and 99), armor and strength. \"\n )\nprint(character)\n# name='Orla' age=21 armor=<Armor.plate: 'plate'> strength=8\n
import outlines\n\nschema = \"\"\"{\n \"$defs\": {\n \"Armor\": {\n \"enum\": [\"leather\", \"chainmail\", \"plate\"],\n \"title\": \"Armor\",\n \"type\": \"string\"\n }\n },\n \"properties\": {\n \"name\": {\"maxLength\": 10, \"title\": \"Name\", \"type\": \"string\"},\n \"age\": {\"title\": \"Age\", \"type\": \"integer\"},\n \"armor\": {\"$ref\": \"#/$defs/Armor\"},\n \"strength\": {\"title\": \"Strength\", \"type\": \"integer\"}\\\n },\n \"required\": [\"name\", \"age\", \"armor\", \"strength\"],\n \"title\": \"Character\",\n \"type\": \"object\"\n}\"\"\"\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = outlines.generate.json(model, schema)\ncharacter = generator(\n \"Generate a new character for my awesome game: \"\n + \"name, age (between 1 and 99), armor and strength. \"\n )\nprint(character)\n# {'name': 'Yuki', 'age': 24, 'armor': 'plate', 'strength': 3}\n
Note
We advise you to constrain the length of the strings fields when first testing your schema, especially with small models.
"},{"location":"reference/generation/generation/#grammar-structured-generation","title":"Grammar-structured generation","text":"Outlines also allows to generate text that is valid to any context-free grammar (CFG) in the EBNF format. Grammars can be intimidating, but they are a very powerful tool! Indeed, they determine the syntax of every programming language, valid chess moves, molecule structure, can help with procedural graphics generation, etc.
Here we show a simple example of a grammar that defines arithmetic operations:
from outlines import models, generate\n\narithmetic_grammar = \"\"\"\n ?start: sum\n\n ?sum: product\n | sum \"+\" product -> add\n | sum \"-\" product -> sub\n\n ?product: atom\n | product \"*\" atom -> mul\n | product \"/\" atom -> div\n\n ?atom: NUMBER -> number\n | \"-\" atom -> neg\n | \"(\" sum \")\"\n\n %import common.NUMBER\n %import common.WS_INLINE\n\n %ignore WS_INLINE\n\"\"\"\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = generate.cfg(model, arithmetic_grammar, max_tokens=100)\n\nresult = generator(\"Question: How can you write 5*5 using addition?\\nAnswer:\")\nprint(result)\n# 5+5+5+5+5\n
EBNF grammars can be cumbersome to write. This is why Outlines provides grammar definitions in the outlines.grammars.
module
from outlines import models, generate, grammars\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = generate.cfg(model, grammars.arithmetic, max_tokens=100)\n\nresult = generator(\"Question: How can you write 5*5 using addition?\\nAnswer:\")\nprint(result)\n# 5+5+5+5+5\n
The available grammars are listed here.
"},{"location":"reference/generation/generation/#regex-structured-generation","title":"Regex-structured generation","text":"Slightly simpler, but no less useful, Outlines can generate text that is in the language of a regular expression. For instance to force the model to generate IP addresses:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\n\nregex_str = r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\"\ngenerator = generate.regex(model, regex_str)\n\nresult = generator(\"What is the IP address of localhost?\\nIP: \")\nprint(result)\n# 127.0.0.100\n
"},{"location":"reference/generation/generation/#generate-a-given-python-type","title":"Generate a given Python type","text":"We provide a shortcut to regex-structured generation for simple use cases. Pass a Python type to the outlines.generate.format
function and the LLM will output text that matches this type:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = generate.format(model, int)\n\nresult = generator(\"What is 2+2?\")\nprint(result)\n# 4\n
"},{"location":"reference/generation/json/","title":"JSON structured generation","text":"Outlines can make any open source model return a JSON object that follows a structure that is specified by the user. This is useful whenever we want the output of the model to be processed by code downstream: code does not understand natural language but rather the structured language it has been programmed to understand.
There are mostly two reasons why someone would want to get an output formatted as JSON from a LLM:
- Parse the answer (e.g. with Pydantic), store it somewhere, return it to a user, etc.
- Call a function with the result
Outlines has you covered in both cases! Indeed, to define the structure of the JSON you want the model to follow you can either provide a Pydantic model, or a function. No need to duplicate code!
"},{"location":"reference/generation/json/#using-pydantic","title":"Using Pydantic","text":"Outlines can infer the structure of the output from a Pydantic model. The result is an instance of the model that contains the values returned by the LLM:
from pydantic import BaseModel\n\nfrom outlines import models, generate\n\n\nclass User(BaseModel):\n name: str\n last_name: str\n id: int\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.json(model, User)\nresult = generator(\n \"Create a user profile with the fields name, last_name and id\"\n)\nprint(result)\n# User(name=\"John\", last_name=\"Doe\", id=11)\n
JSON and whitespaces
By default Outlines prevents the model from generating json with syntactic newlines, tabs, or multiple spaces. The default whitespace_pattern
is r\"[ ]?\"
. Small models tend to enter an infinite repetition loop if the whitespace_pattern
allows infinite spacing. If you would like to allow the model to generate multiple tabs, newlines, and spaces, you can set the whitespace pattern as follows:
generator = generate.json(model, User, whitespace_pattern=r\"[\\n\\t ]*\")\n
Performance
generation.json
computes an index that helps Outlines guide generation. This can take some time, but only needs to be done once. If you want to generate several times with the same schema make sure that you only call generate.json
once.
Custom types
Outlines provides custom Pydantic types so you do not have to write regular expressions for common types, such as phone numbers or zip codes.
"},{"location":"reference/generation/json/#using-a-json-schema","title":"Using a JSON Schema","text":"Instead of a Pydantic model you can pass a string that represents a JSON Schema specification to generate.json
:
from pydantic import BaseModel\n\nfrom outlines import models\nfrom outlines import generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n\nschema = \"\"\"\n{\n \"title\": \"User\",\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\"type\": \"string\"},\n \"last_name\": {\"type\": \"string\"},\n \"id\": {\"type\": \"integer\"}\n },\n \"required\": [\"name\", \"last_name\", \"id\"]\n}\n\"\"\"\n\ngenerator = generate.json(model, schema)\nresult = generator(\n \"Create a user profile with the fields name, last_name and id\"\n)\nprint(result)\n# User(name=\"John\", last_name=\"Doe\", id=11)\n
"},{"location":"reference/generation/json/#from-a-functions-signature","title":"From a function's signature","text":"Outlines can infer the structure of the output from the signature of a function. The result is a dictionary, and can be passed directly to the function using the usual dictionary expansion syntax **
:
from outlines import models\nfrom outlines import generate\n\ndef add(a: int, b: int):\n return a + b\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.json(model, add)\nresult = generator(\"Return two integers named a and b respectively. a is odd and b even.\")\n\nprint(add(**result))\n# 3\n
A great advantage of passing functions directly to specify the structure is that the structure of the LLM will change with the function's definition. No need to change the code at several places!
"},{"location":"reference/generation/regex/","title":"Regular expressions","text":"Outlines can guarantee that the text generated by the LLM will be valid to a regular expression:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n\ngenerator = generate.regex(\n model,\n r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\",\n)\n\nprompt = \"What is the IP address of the Google DNS servers? \"\nanswer = generator(prompt, max_tokens=30)\n\nprint(answer)\n# What is the IP address of the Google DNS servers?\n# 2.2.6.1\n
If you find yourself using generate.regex
to restrict the answers' type you can take a look at type-structured generation instead.
Performance
generate.regex
computes an index that helps Outlines guide generation. This can take some time, but only needs to be done once. If you want to generate several times using the same regular expression make sure that you only call generate.regex
once.
"},{"location":"reference/generation/structured_generation_explanation/","title":"How does Outlines work?","text":"Language models generate text token by token, using the previous token sequence as input and sampled logits as output. This document explains the structured generation process, where only legal tokens are considered for the next step based on a predefined automata, e.g. a regex-defined finite-state machine (FSM) or Lark grammar.`
"},{"location":"reference/generation/structured_generation_explanation/#worked-example","title":"Worked Example","text":"Let's consider a worked example with a pattern for whole and decimal numbers:
^\\d*(\\.\\d+)?$
.
"},{"location":"reference/generation/structured_generation_explanation/#creating-automata","title":"Creating Automata","text":"The pattern is first converted into an automata. Below is a brief explanation of the automata conversion and its representation.
Automata Diagram:
graph LR\n node0(\"1-9\") --> node1(\"1-9\")\n node1 --> node1\n node1 --> nodeEND{{END}}\n node1 --> nodePeriod(\".\")\n nodePeriod --> node2(\"1-9\")\n node2 --> node2\n node2 --> nodeEND{{END}}
"},{"location":"reference/generation/structured_generation_explanation/#generating-a-token","title":"Generating a Token","text":"Let's assume that we're in the middle of generation, and so far \"748\" has been generated. Here is the automata with the current state highlighted in green, with the legal next characters being another number (1-9), a dot (.), or end of sequence.
graph LR\n node0(\"1-9\") --> node1(\"1-9\")\n node1 --> node1\n node1 --> nodeEND{{END}}\n node1 --> nodePeriod(\".\")\n nodePeriod --> node2(\"1-9\")\n node2 --> node2\n node2 --> nodeEND{{END}}\n\n style node1 fill:#090
Generating a token requires the following steps:
- Feed the previous input sequence (\"748\") into the language model.
- Language model runs a forward pass and produces token logits.
- Outlines logits processor sets the probability of illegal tokens to 0%.
- A token is sampled from the set of legal tokens.
"},{"location":"reference/generation/types/","title":"Custom types","text":"Outlines provides custom Pydantic types so you can focus on your use case rather than on writing regular expressions:
Category Type Import Description ISBN 10 & 13 outlines.types.ISBN
There is no guarantee that the check digit will be correct Airport IATA outlines.types.airports.IATA
Valid airport IATA codes Country alpha-2 code outlines.types.airports.Alpha2
Valid country alpha-2 codes alpha-3 code outlines.types.countries.Alpha3
Valid country alpha-3 codes numeric code outlines.types.countries.Numeric
Valid country numeric codes name outlines.types.countries.Name
Valid country names flag outlines.types.countries.Flag
Valid flag emojis email outlines.types.Email
Valid email address Some types require localization. We currently only support US types, but please don't hesitate to create localized versions of the different types and open a Pull Request. Localized types are specified using types.locale
in the following way:
from outlines import types\n\ntypes.locale(\"us\").ZipCode\ntypes.locale(\"us\").PhoneNumber\n
Here are the localized types that are currently available:
Category Locale Import Description Zip code US ZipCode
Generate US Zip(+4) codes Phone number US PhoneNumber
Generate valid US phone numbers You can use these types in Pydantic schemas for JSON-structured generation:
from pydantic import BaseModel\n\nfrom outlines import models, generate, types\n\n# Specify the locale for types\nlocale = types.locale(\"us\")\n\nclass Client(BaseModel):\n name: str\n phone_number: locale.PhoneNumber\n zip_code: locale.ZipCode\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.json(model, Client)\nresult = generator(\n \"Create a client profile with the fields name, phone_number and zip_code\"\n)\nprint(result)\n# name='Tommy' phone_number='129-896-5501' zip_code='50766'\n
Or simply with outlines.generate.format
:
from pydantic import BaseModel\n\nfrom outlines import models, generate, types\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.format(model, types.locale(\"us\").PhoneNumber)\nresult = generator(\n \"Return a US Phone number: \"\n)\nprint(result)\n# 334-253-2630\n
We plan on adding many more custom types. If you have found yourself writing regular expressions to generate fields of a given type, or if you could benefit from more specific types don't hesite to submit a PR or open an issue.
"},{"location":"reference/models/exllamav2/","title":"ExllamaV2","text":"The outlines.models.exllamav2
model requires a Logits Processor component for compatibility with Outlines structured generation. While ExLlamaV2 doesn't natively support this feature, a third-party fork provides the necessary functionality. You can install it with the following command:
pip install git+https://github.com/lapp0/exllamav2@sampler-logits-processor\n
Install other requirements:
pip install transformers torch\n
Coming soon
"},{"location":"reference/models/llamacpp/","title":"Llama.cpp","text":"Outlines provides an integration with Llama.cpp using the llama-cpp-python library. Llamacpp allows to run quantized models on machines with limited compute.
Installation
You need to install the llama-cpp-python
library to use the llama.cpp integration. See the installation section for instructions to install llama-cpp-python
with CUDA, Metal, ROCm and other backends. To get started quickly you can also run:
pip install \"outlines[llamacpp]\"\n
"},{"location":"reference/models/llamacpp/#load-the-model","title":"Load the model","text":"You can initialize the model by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
from outlines import models\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\n
This will download the model files to the hub cache folder and load the weights in memory.
You can also initialize the model by passing the path to the weights on your machine. Assuming Phi2's weights are in the current directory:
from outlines import models\nfrom llama_cpp import Llama\n\nllm = Llama(\"./phi-2.Q4_K_M.gguf\")\nmodel = models.LlamaCpp(llm)\n
If you need more control, you can pass the same keyword arguments to the model as you would pass in the llama-ccp-library:
from outlines import models\n\nmodel = models.llamacpp(\n \"TheBloke/phi-2-GGUF\",\n \"phi-2.Q4_K_M.gguf\"\n n_ctx=512, # to set the context length value\n)\n
Main parameters:
Parameters Type Description Default n_gpu_layers
int
Number of layers to offload to GPU. If -1, all layers are offloaded 0
split_mode
int
How to split the model across GPUs. 1
for layer-wise split, 2
for row-wise split 1
main_gpu
int
Main GPU 0
tensor_split
Optional[List[float]]
How split tensors should be distributed across GPUs. If None
the model is not split. None
n_ctx
int
Text context. Inference from the model if set to 0
0
n_threads
Optional[int]
Number of threads to use for generation. All available threads if set to None
. None
verbose
bool
Print verbose outputs to stderr
False
See the llama-cpp-python documentation for the full list of parameters.
"},{"location":"reference/models/llamacpp/#load-the-model-on-gpu","title":"Load the model on GPU","text":"Note
Make sure that you installed llama-cpp-python
with GPU support.
To load the model on GPU, pass n_gpu_layers=-1
:
from outlines import models\n\nmodel = models.llamacpp(\n \"TheBloke/phi-2-GGUF\",\n \"phi-2.Q4_K_M.gguf\",\n n_gpu_layers=-1, # to use GPU acceleration\n)\n
This also works with generators built with generate.regex
, generate.json
, generate.cfg
, generate.format
and generate.choice
.
"},{"location":"reference/models/llamacpp/#load-lora-adapters","title":"Load LoRA adapters","text":"You can load LoRA adapters dynamically:
from outlines import models, generate\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\ngenerator = generate.text(model)\nanswer_1 = generator(\"prompt\")\n\nmodel.load_lora(\"./path/to/adapter.gguf\")\nanswer_2 = generator(\"prompt\")\n
To load another adapter you need to re-initialize the model. Otherwise the adapter will be added on top of the previous one:
from outlines import models\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\nmodel.load_lora(\"./path/to/adapter1.gguf\") # Load first adapter\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\nmodel.load_lora(\"./path/to/adapter2.gguf\") # Load second adapter\n
"},{"location":"reference/models/llamacpp/#generate-text","title":"Generate text","text":"In addition to the parameters described in the text generation section you can pass extra keyword arguments, for instance to set sampling parameters not exposed in Outlines' public API:
from outlines import models, generate\n\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\ngenerator = generate.text(model)\n\nanswer = generator(\"A prompt\", presence_penalty=0.8)\n
Extra keyword arguments:
The value of the keyword arguments you pass to the generator suspersede the values set when initializing the sampler or generator. All extra sampling methods and repetition penalties are disabled by default.
Parameters Type Description Default suffix
Optional[str]
A suffix to append to the generated text. If None
no suffix is added. None
echo
bool
Whether to preprend the prompt to the completion. False
seed
int
The random seed to use for sampling. None
max_tokens
Optional[int]
The maximum number of tokens to generate. If None
the maximum number of tokens depends on n_ctx
. 16
frequence_penalty
float
The penalty to apply to tokens based on their frequency in the past 64 tokens. 0.0
presence_penalty
float
The penalty to apply to tokens based on their presence in the past 64 tokens. 0.0
repeat_penalty
float
The penalty to apply to repeated tokens in the past 64 tokens. 1.
stopping_criteria
Optional[StoppingCriteriaList]
A list of stopping criteria to use. None
logits_processor
Optional[LogitsProcessorList]
A list of logits processors to use. The logits processor used for structured generation will be added to this list. None
temperature
float
The temperature to use for sampling 1.0
top_p
float
The top-p value to use for nucleus sampling. 1.
min_p
float
The min-p value to use for minimum-p sampling. 0.
typical_p
float
The p value to use for locally typical sampling. 1.0
stop
Optional[Union[str, List[str]]]
A list of strings that stop generation when encountered. []
top_k
int
The top-k value used for top-k sampling. Negative value to consider all logit values. -1.
tfs_z
float
The tail-free sampling parameter. 1.0
mirostat_mode
int
The mirostat sampling mode. 0
mirostat_tau
float
The target cross-entropy for mirostat sampling. 5.0
mirostat_eta
float
The learning rate used to update mu
in mirostat sampling. 0.1
See the llama-cpp-python documentation for the full and up-to-date list of parameters and the llama.cpp code for the default values of other sampling parameters.
"},{"location":"reference/models/llamacpp/#streaming","title":"Streaming","text":""},{"location":"reference/models/llamacpp/#installation","title":"Installation","text":"You need to install the llama-cpp-python
library to use the llama.cpp integration.
"},{"location":"reference/models/llamacpp/#cpu","title":"CPU","text":"For a CPU-only installation run:
pip install llama-cpp-python\n
Warning
Do not run this command if you want support for BLAS, Metal or CUDA. Follow the instructions below instead.
"},{"location":"reference/models/llamacpp/#cuda","title":"CUDA","text":"CMAKE_ARGS=\"-DLLAMA_CUDA=on\" pip install llama-cpp-python\n
It is also possible to install pre-built wheels with CUDA support (Python 3.10 and above):
pip install llama-cpp-python \\\n --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>\n
Where <cuda-version>
is one of the following, depending on the version of CUDA installed on your system:
cu121
for CUDA 12.1 cu122
for CUDA 12.2 cu123
CUDA 12.3
"},{"location":"reference/models/llamacpp/#metal","title":"Metal","text":"CMAKE_ARGS=\"-DLLAMA_METAL=on\" pip install llama-cpp-python\n
It is also possible to install pre-build wheels with Metal support (Python 3.10 or above, MacOS 11.0 and above):
pip install llama-cpp-python \\\n --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal\n
"},{"location":"reference/models/llamacpp/#openblas","title":"OpenBLAS","text":"CMAKE_ARGS=\"-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS\" pip install llama-cpp-python\n
"},{"location":"reference/models/llamacpp/#other-backend","title":"Other backend","text":"llama.cpp
supports many other backends. Refer to the llama.cpp documentation to use the following backends:
- CLBast (OpenCL)
- hipBLAS (ROCm)
- Vulkan
- Kompute
- SYCL
"},{"location":"reference/models/mlxlm/","title":"mlx-lm","text":"Outlines provides an integration with mlx-lm, allowing models to be run quickly on Apple Silicon via the mlx library.
Installation
You need to install the mlx
and mlx-lm
libraries on a device which supports Metal to use the mlx-lm integration. To get started quickly you can also run:
pip install \"outlines[mlxlm]\"\n
"},{"location":"reference/models/mlxlm/#load-the-model","title":"Load the model","text":"You can initialize the model by passing the name of the repository on the HuggingFace Hub. The official repository for mlx-lm supported models is mlx-community.
from outlines import models\n\nmodel = models.mlxlm(\"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\")\n
This will download the model files to the hub cache folder and load the weights in memory.
The arguments model_config
and tokenizer_config
are available to modify loading behavior. For example, per the mlx-lm
documentation, you must set an eos_token for qwen/Qwen-7B
. In outlines you may do so via
model = models.mlxlm(\n \"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\",\n tokenizer_config={\"eos_token\": \"<|endoftext|>\", \"trust_remote_code\": True},\n)\n
Main parameters:
(Subject to change. Table based on mlx-lm.load docstring)
Parameters Type Description Default tokenizer_config
dict
Configuration parameters specifically for the tokenizer. Defaults to an empty dictionary. {}
model_config
dict
Configuration parameters specifically for the model. Defaults to an empty dictionary. {}
adapter_path
str
Path to the LoRA adapters. If provided, applies LoRA layers to the model. None
lazy
bool
If False, evaluate the model parameters to make sure they are loaded in memory before returning. False
"},{"location":"reference/models/mlxlm/#generate-text","title":"Generate text","text":"You may generate text using the parameters described in the text generation documentation.
With the loaded model, you can generate text or perform structured generation, e.g.
from outlines import models, generate\n\nmodel = models.mlxlm(\"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\")\ngenerator = generate.text(model)\n\nanswer = generator(\"A prompt\", temperature=2.0)\n
"},{"location":"reference/models/mlxlm/#streaming","title":"Streaming","text":"You may creating a streaming iterable with minimal changes
from outlines import models, generate\n\nmodel = models.mlxlm(\"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\")\ngenerator = generate.text(model)\n\nfor token_str in generator.text(\"A prompt\", temperature=2.0):\n print(token_str)\n
"},{"location":"reference/models/mlxlm/#structured","title":"Structured","text":"You may perform structured generation with mlxlm to guarantee your output will match a regex pattern, json schema, or lark grammar.
Example: Phone number generation with pattern \"\\\\+?[1-9][0-9]{7,14}\"
:
from outlines import models, generate\n\nmodel = models.mlxlm(\"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\")\n\nphone_number_pattern = \"\\\\+?[1-9][0-9]{7,14}\"\ngenerator = generate.regex(model, phone_number_pattern)\n\nmodel_output = generator(\"What's Jennys Number?\\n\")\nprint(model_output)\n# '8675309'\n
"},{"location":"reference/models/models/","title":"Models","text":"Outlines supports generation using a number of inference engines (outlines.models
). Loading a model using outlines follows a similar interface between inference engines:
import outlines\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\nmodel = outlines.models.transformers_vision(\"llava-hf/llava-v1.6-mistral-7b-hf\")\nmodel = outlines.models.vllm(\"microsoft/Phi-3-mini-128k-instruct\")\nmodel = outlines.models.llamacpp(\n \"microsoft/Phi-3-mini-4k-instruct-gguf\", \"Phi-3-mini-4k-instruct-q4.gguf\"\n)\nmodel = outlines.models.exllamav2(\"bartowski/Phi-3-mini-128k-instruct-exl2\")\nmodel = outlines.models.mlxlm(\"mlx-community/Phi-3-mini-4k-instruct-4bit\")\n\nmodel = outlines.models.openai(\n \"gpt-4o-mini\",\n api_key=os.environ[\"OPENAI_API_KEY\"]\n)\n
"},{"location":"reference/models/models/#feature-matrix","title":"Feature Matrix","text":"Transformers Transformers Vision vLLM llama.cpp ExLlamaV2 MLXLM OpenAI* Device Cuda \u2705 \u2705 \u2705 \u2705 \u2705 \u274c N/A Apple Silicon \u2705 \u2705 \u274c \u2705 \u2705 \u2705 N/A x86 / AMD64 \u2705 \u2705 \u274c \u2705 \u2705 \u274c N/A Sampling Greedy \u2705 \u2705 \u2705 \u2705* \u2705 \u2705 \u274c Multinomial \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 Multiple Samples \u2705 \u2705 \u274c \u274c \u2705 Beam Search \u2705 \u2705 \u2705 \u274c \u2705 \u274c \u274c Generation Batch \u2705 \u2705 \u2705 \u274c ? \u274c \u274c Stream \u2705 \u274c \u274c \u2705 ? \u2705 \u274c outlines.generate
Text \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 Structured \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 JSON Schema \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 Choice \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 Regex \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u274c Grammar \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u274c"},{"location":"reference/models/models/#caveats","title":"Caveats","text":" - OpenAI doesn't support structured generation due to limitations in their API and server implementation.
outlines.generate
\"Structured\" includes methods such as outlines.generate.regex
, outlines.generate.json
, outlines.generate.cfg
, etc. - MLXLM only supports Apple Silicon.
- llama.cpp greedy sampling available via multinomial with
temperature = 0.0
.
"},{"location":"reference/models/openai/","title":"OpenAI and compatible APIs","text":"Installation
You need to install the openai
library to be able to use the OpenAI API in Outlines. Or alternatively:
pip install \"outlines[openai]\"\n
"},{"location":"reference/models/openai/#openai-models","title":"OpenAI models","text":"Outlines supports models available via the OpenAI Chat API, e.g. GPT-4o, ChatGPT and GPT-4. You can initialize the model by passing the model name to outlines.models.openai
:
from outlines import models\n\n\nmodel = models.openai(\"gpt-4o-mini\")\nmodel = models.openai(\"gpt-4o\")\n
Check the OpenAI documentation for an up-to-date list of available models. You can pass any parameter you would pass to openai.AsyncOpenAI
as keyword arguments:
import os\nfrom outlines import models\n\n\nmodel = models.openai(\n \"gpt-4o-mini\",\n api_key=os.environ[\"OPENAI_API_KEY\"]\n)\n
The following table enumerates the possible parameters. Refer to the OpenAI SDK's code for an up-to-date list.
Parameters:
Parameters Type Description Default api_key
str
OpenAI API key. Infered from OPENAI_API_KEY
if not specified None
organization
str
OpenAI organization id. Infered from OPENAI_ORG_ID
if not specified None
project
str
OpenAI project id. Infered from OPENAI_PROJECT_ID
if not specified. None
base_url
str | https.URL
Base URL for the endpoint. Infered from OPENAI_BASE_URL
if no specified. None
timeout
float
Request timeout. NOT_GIVEN
max_retries
int
Maximum number of retries for failing requests 2
default_headers
Mapping[str, str]
Default HTTP headers None
default_query
Mapping[str, str]
Custom parameters added to the HTTP queries None
http_client
https.AsyncClient
User-specified httpx
client None
"},{"location":"reference/models/openai/#azure-openai-models","title":"Azure OpenAI models","text":"Outlines also supports Azure OpenAI models:
from outlines import models\n\n\nmodel = models.azure_openai(\n \"azure-deployment-name\",\n \"gpt-4o-mini\",\n api_version=\"2024-07-18\",\n azure_endpoint=\"https://example-endpoint.openai.azure.com\",\n)\n
Why do I need to specify model and deployment name?
The model name is needed to load the correct tokenizer for the model. The tokenizer is necessary for structured generation.
You can pass any parameter you would pass to openai.AsyncAzureOpenAI
. You can consult the OpenAI SDK's code for an up-to-date list.
Parameters:
Parameters Type Description Default azure_endpoint
str
Azure endpoint, including the resource. Infered from AZURE_OPENAI_ENDPOINT
if not specified None
api_version
str
API version. Infered from AZURE_OPENAI_API_KEY
if not specified None
api_key
str
OpenAI API key. Infered from OPENAI_API_KEY
if not specified None
azure_ad_token
str
Azure active directory token. Inference from AZURE_OPENAI_AD_TOKEN
if not specified None
azure_ad_token_provider
AzureADTokenProvider
A function that returns an Azure Active Directory token None
organization
str
OpenAI organization id. Infered from OPENAI_ORG_ID
if not specified None
project
str
OpenAI project id. Infered from OPENAI_PROJECT_ID
if not specified. None
base_url
str | https.URL
Base URL for the endpoint. Infered from OPENAI_BASE_URL
if not specified. None
timeout
float
Request timeout. NOT_GIVEN
max_retries
int
Maximum number of retries for failing requests 2
default_headers
Mapping[str, str]
Default HTTP headers None
default_query
Mapping[str, str]
Custom parameters added to the HTTP queries None
http_client
https.AsyncClient
User-specified httpx
client None
"},{"location":"reference/models/openai/#models-that-follow-the-openai-standard","title":"Models that follow the OpenAI standard","text":"Outlines supports models that follow the OpenAI standard. You will need to initialize the OpenAI client properly configured and pass it to outlines.models.openai
import os\nfrom openai import AsyncOpenAI\nfrom outlines import models\nfrom outlines.models.openai import OpenAIConfig\n\n\nclient = AsyncOpenAI(\n api_key=os.environ.get(\"PROVIDER_KEY\"),\n base_url=\"http://other.provider.server.com\"\n)\nconfig = OpenAIConfig(\"model_name\")\nmodel = models.openai(client, config)\n
Warning
You need to pass the async client to be able to do batch inference.
"},{"location":"reference/models/openai/#structured-generation-support","title":"Structured Generation Support","text":"Outlines provides support for OpenAI Structured Outputs via outlines.generate.json
, outlines.generate.choice
from pydantic import BaseModel, ConfigDict\nimport outlines.models as models\nfrom outlines import generate\n\nmodel = models.openai(\"gpt-4o-mini\")\n\nclass Person(BaseModel):\n model_config = ConfigDict(extra='forbid') # required for openai\n first_name: str\n last_name: str\n age: int\n\ngenerate.json(model, Person)\ngenerator(\"current indian prime minister on january 1st 2023\")\n# Person(first_name='Narendra', last_name='Modi', age=72)\n\ngenerator = generate.choice(model, [\"Chicken\", \"Egg\"])\nprint(generator(\"Which came first?\"))\n# Chicken\n
Warning
Structured generation support only provided to OpenAI-compatible endpoints which conform to OpenAI's standard. Additionally, generate.regex
and generate.cfg
are not supported.
"},{"location":"reference/models/openai/#advanced-configuration","title":"Advanced configuration","text":"For more advanced configuration option, such as support proxy, please consult the OpenAI SDK's documentation:
from openai import AsyncOpenAI, DefaultHttpxClient\nfrom outlines import models\nfrom outlines.models.openai import OpenAIConfig\n\n\nclient = AsyncOpenAI(\n base_url=\"http://my.test.server.example.com:8083\",\n http_client=DefaultHttpxClient(\n proxies=\"http://my.test.proxy.example.com\",\n transport=httpx.HTTPTransport(local_address=\"0.0.0.0\"),\n ),\n)\nconfig = OpenAIConfig(\"model_name\")\nmodel = models.openai(client, config)\n
It is possible to specify the values for seed
, presence_penalty
, frequence_penalty
, top_p
by passing an instance of OpenAIConfig
when initializing the model:
from outlines.models.openai import OpenAIConfig\nfrom outlines import models\n\n\nconfig = OpenAIConfig(\n presence_penalty=1.,\n frequency_penalty=1.,\n top_p=.95,\n seed=0,\n)\nmodel = models.openai(\"gpt-4o-mini\", config)\n
"},{"location":"reference/models/openai/#monitoring-api-use","title":"Monitoring API use","text":"It is important to be able to track your API usage when working with OpenAI's API. The number of prompt tokens and completion tokens is directly accessible via the model instance:
from openai import AsyncOpenAI\nimport outlines.models\n\n\nmodel = models.openai(\"gpt-4o\")\n\nprint(model.prompt_tokens)\n# 0\n\nprint(model.completion_tokens)\n# 0\n
These numbers are updated every time you call the model.
"},{"location":"reference/models/tgi/","title":"Text-generation-inference (TGI)","text":"TGI uses Outlines to provide structured generation, see their documentation.
"},{"location":"reference/models/transformers/","title":"transformers","text":"Installation
You need to install the transformer
, datasets
and torch
libraries to be able to use these models in Outlines, or alternatively:
pip install \"outlines[transformers]\"\n
Outlines provides an integration with the torch
implementation of causal models in the transformers library. You can initialize the model by passing its name:
from outlines import models\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\", device=\"cuda\")\n
If you need more fine-grained control you can also initialize the model and tokenizer separately:
from transformers import AutoModelForCausalLM, AutoTokenizer\nfrom outlines import models\n\nllm = AutoModelForCausalLM.from_pretrained(\"gpt2\", output_attentions=True)\ntokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\nmodel = models.Transformers(llm, tokenizer)\n
"},{"location":"reference/models/transformers/#using-logits-processors","title":"Using Logits Processors","text":"There are two ways to use Outlines Structured Generation with HuggingFace Transformers:
- Use Outlines generation wrapper,
outlines.models.transformers
- Use
OutlinesLogitsProcessor
with transformers.AutoModelForCausalLM
Outlines supports a myriad of logits processors for structured generation. In these example, we will use the RegexLogitsProcessor
which guarantees generated text matches the specified pattern.
"},{"location":"reference/models/transformers/#using-outlinesmodelstransformers","title":"Using outlines.models.transformers
","text":"import outlines\n\ntime_regex_pattern = r\"(0?[1-9]|1[0-2]):[0-5]\\d\\s?(am|pm)?\"\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-4k-instruct\", device=\"cuda\")\ngenerator = outlines.generate.regex(model, time_regex_pattern)\n\noutput = generator(\"The the best time to visit a dentist is at \")\nprint(output)\n# 2:30 pm\n
"},{"location":"reference/models/transformers/#using-models-initialized-via-the-transformers-library","title":"Using models initialized via the transformers
library","text":"import outlines\nimport transformers\n\n\nmodel_uri = \"microsoft/Phi-3-mini-4k-instruct\"\n\noutlines_tokenizer = outlines.models.TransformerTokenizer(\n transformers.AutoTokenizer.from_pretrained(model_uri)\n)\nphone_number_logits_processor = outlines.processors.RegexLogitsProcessor(\n \"\\\\+?[1-9][0-9]{7,14}\", # phone number pattern\n outlines_tokenizer,\n)\n\ngenerator = transformers.pipeline('text-generation', model=model_uri)\n\noutput = generator(\n \"Jenny gave me her number it's \",\n logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor])\n)\nprint(output)\n# [{'generated_text': \"Jenny gave me her number it's 2125550182\"}]\n# not quite 8675309 what we expected, but it is a valid phone number\n
"},{"location":"reference/models/transformers/#alternative-model-classes","title":"Alternative Model Classes","text":"outlines.models.transformers
defaults to transformers.AutoModelForCausalLM
, which is the appropriate class for most standard large language models, including Llama 3, Mistral, Phi-3, etc.
However other variants with unique behavior can be used as well by passing the appropriate class.
"},{"location":"reference/models/transformers/#mamba","title":"Mamba","text":"Mamba is a transformers alternative which employs memory efficient, linear-time decoding.
To use Mamba with outlines you must first install the necessary requirements:
pip install causal-conv1d>=1.2.0 mamba-ssm torch transformers\n
Then you can either create an Mamba-2 Outlines model via
import outlines\n\nmodel = outlines.models.mamba(\"state-spaces/mamba-2.8b-hf\")\n
or explicitly with
import outlines\nfrom transformers import MambaForCausalLM\n\nmodel = outlines.models.transformers(\n \"state-spaces/mamba-2.8b-hf\",\n model_class=MambaForCausalLM\n)\n
Read transformers
's documentation for more information.
"},{"location":"reference/models/transformers/#encoder-decoder-models","title":"Encoder-Decoder Models","text":"You can use encoder-decoder (seq2seq) models like T5 and BART with Outlines.
Be cautious with model selection though, some models such as t5-base
don't include certain characters ({
) and you may get an error when trying to perform structured generation.
T5 Example:
import outlines\nfrom transformers import AutoModelForSeq2SeqLM\n\nmodel_pile_t5 = models.transformers(\n model_name=\"EleutherAI/pile-t5-large\",\n model_class=AutoModelForSeq2SeqLM,\n)\n
Bart Example:
model_bart = models.transformers(\n model_name=\"facebook/bart-large\",\n model_class=AutoModelForSeq2SeqLM,\n)\n
"},{"location":"reference/models/transformers_vision/","title":"Transformers Vision","text":"Outlines allows seamless use of vision models.
outlines.models.transformers_vision
has shares interfaces with, and is based on outlines.models.transformers.
Tasks supported include
- image + text -> text
- video + text -> text
"},{"location":"reference/models/transformers_vision/#example-using-llava-next-vision-models","title":"Example: Using Llava-Next Vision Models","text":"Install dependencies pip install torchvision pillow flash-attn
Create the model
import outlines\nfrom transformers import LlavaNextForConditionalGeneration\n\nmodel = outlines.models.transformers_vision(\n \"llava-hf/llava-v1.6-mistral-7b-hf\",\n model_class=LlavaNextForConditionalGeneration,\n device=\"cuda\",\n)\n
Create convenience function to load a PIL.Image
from URL
from PIL import Image\nfrom io import BytesIO\nfrom urllib.request import urlopen\n\ndef img_from_url(url):\n img_byte_stream = BytesIO(urlopen(url).read())\n return Image.open(img_byte_stream).convert(\"RGB\")\n
"},{"location":"reference/models/transformers_vision/#describing-an-image","title":"Describing an image","text":"description_generator = outlines.generate.text(model)\ndescription_generator(\n \"<image> detailed description:\",\n [img_from_url(\"https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg\")]\n)\n
This is a color photograph featuring a Siamese cat with striking blue eyes. The cat has a creamy coat and a light eye color, which is typical for the Siamese breed. Its features include elongated ears, a long, thin tail, and a striking coat pattern. The cat is sitting in an indoor setting, possibly on a cat tower or a similar raised platform, which is covered with a beige fabric, providing a comfortable and soft surface for the cat to rest or perch. The surface of the wall behind the cat appears to be a light-colored stucco or plaster.
"},{"location":"reference/models/transformers_vision/#multiple-images","title":"Multiple Images","text":"To include multiple images in your prompt you simply add more <image>
tokens to the prompt
image_urls = [\n \"https://cdn1.byjus.com/wp-content/uploads/2020/08/ShapeArtboard-1-copy-3.png\", # triangle\n \"https://cdn1.byjus.com/wp-content/uploads/2020/08/ShapeArtboard-1-copy-11.png\", # hexagon\n]\ndescription_generator = outlines.generate.text(model)\ndescription_generator(\n \"<image><image><image>What shapes are present?\",\n list(map(img_from_url, image_urls)),\n)\n
There are two shapes present. One shape is a hexagon and the other shape is an triangle. '
"},{"location":"reference/models/transformers_vision/#classifying-an-image","title":"Classifying an Image","text":"pattern = \"Mercury|Venus|Earth|Mars|Saturn|Jupiter|Neptune|Uranus|Pluto\"\nplanet_generator = outlines.generate.regex(model, pattern)\n\nplanet_generator(\n \"What planet is this: <image>\",\n [img_from_url(\"https://upload.wikimedia.org/wikipedia/commons/e/e3/Saturn_from_Cassini_Orbiter_%282004-10-06%29.jpg\")]\n)\n
Saturn
"},{"location":"reference/models/transformers_vision/#extracting-structured-image-data","title":"Extracting Structured Image data","text":"from pydantic import BaseModel\nfrom typing import List, Optional\n\nclass ImageData(BaseModel):\n caption: str\n tags_list: List[str]\n object_list: List[str]\n is_photo: bool\n\nimage_data_generator = outlines.generate.json(model, ImageData)\n\nimage_data_generator(\n \"<image> detailed JSON metadata:\",\n [img_from_url(\"https://upload.wikimedia.org/wikipedia/commons/9/98/Aldrin_Apollo_11_original.jpg\")]\n)\n
ImageData(caption='An astronaut on the moon', tags_list=['moon', 'space', 'nasa', 'americanflag'], object_list=['moon', 'moon_surface', 'space_suit', 'americanflag'], is_photo=True)
"},{"location":"reference/models/transformers_vision/#resources","title":"Resources","text":""},{"location":"reference/models/transformers_vision/#chosing-a-model","title":"Chosing a model","text":" - https://mmbench.opencompass.org.cn/leaderboard
- https://huggingface.co/spaces/WildVision/vision-arena
"},{"location":"reference/models/vllm/","title":"vLLM","text":"Installation
You need to install the vllm
library to use the vLLM integration. See the installation section for instructions to install vLLM for CPU or ROCm. To get started you can also run:
pip install \"outlines[vllm]\"\n
"},{"location":"reference/models/vllm/#load-the-model","title":"Load the model","text":"Outlines supports models available via vLLM's offline batched inference interface. You can load a model using:
from outlines import models\n\nmodel = models.vllm(\"microsoft/Phi-3-mini-4k-instruct\")\n
Or alternatively:
import vllm\nfrom outlines import models\n\nllm = vllm.LLM(\"microsoft/Phi-3-mini-4k-instruct\")\nmodel = models.VLLM(llm)\n
Models are loaded from the HuggingFace hub.
Device
The default installation of vLLM only allows to load models on GPU. See the installation instructions to run models on CPU.
You can pass any parameter that you would normally pass to vllm.LLM
, as keyword arguments:
from outlines import models\n\nmodel = models.vllm(\n \"microsoft/Phi-3-mini-4k-instruct\",\n trust_remote_code=True,\n gpu_memory_utilization=0.7\n)\n
Main parameters:
Parameters Type Description Default tokenizer_mode
str
\"auto\" will use the fast tokenizer if available and \"slow\" will always use the slow tokenizer. auto
trust_remote_code
bool
Trust remote code when downloading the model and tokenizer. False
tensor_parallel_size
int
The number of GPUs to use for distributed execution with tensor parallelism. 1
dtype
str
The data type for the model weights and activations. Currently, we support float32
, float16
, and bfloat16
. If auto
, we use the torch_dtype
attribute specified in the model config file. However, if the torch_dtype
in the config is float32
, we will use float16
instead. auto
quantization
Optional[str]
The method used to quantize the model weights. Currently, we support \"awq\", \"gptq\" and \"squeezellm\". If None, we first check the quantization_config
attribute in the model config file. If that is None, we assume the model weights are not quantized and use dtype
to determine the data type of the weights. None
revision
Optional[str]
The specific model version to use. It can be a branch name, a tag name, or a commit id. None
tokenizer_revision
Optional[str]
The specific tokenizer version to use. It can be a branch name, a tag name, or a commit id. None
gpu_memory_utilization
float
The ratio (between 0 and 1) of GPU memory to reserve for the model weights, activations, and KV cache. Higher values will increase the KV cache size and thus improve the model's throughput. However, if the value is too high, it may cause out-of-memory (OOM) errors. 0.9
swap_space
int
The size (GiB) of CPU memory per GPU to use as swap space. This can be used for temporarily storing the states of the requests when their best_of
sampling parameters are larger than 1. If all requests will have best_of=1
, you can safely set this to 0. Otherwise, too small values may cause out-of-memory (OOM) errors. 4 enforce_eager
bool
Whether to enforce eager execution. If True, we will disable CUDA graph and always execute the model in eager mode. If False, we will use CUDA graph and eager execution in hybrid. False
enable_lora
bool
Whether to enable loading LoRA adapters False
See the vLLM code for a list of all the available parameters.
"},{"location":"reference/models/vllm/#use-quantized-models","title":"Use quantized models","text":"vLLM supports AWQ, GPTQ and SqueezeLLM quantized models:
from outlines import models\n\nmodel = models.vllm(\"TheBloke/Llama-2-7B-Chat-AWQ\", quantization=\"awq\")\nmodel = models.vllm(\"TheBloke/Mistral-7B-Instruct-v0.2-GPTQ\", quantization=\"gptq\")\nmodel = models.vllm(\"https://huggingface.co/squeeze-ai-lab/sq-llama-30b-w4-s5\", quantization=\"squeezellm\")\n
Dependencies
To use AWQ model you need to install the autoawq library pip install autoawq
.
To use GPTQ models you need to install the autoGTPQ and optimum libraries pip install auto-gptq optimum
.
"},{"location":"reference/models/vllm/#multi-gpu-usage","title":"Multi-GPU usage","text":"To run multi-GPU inference with vLLM you need to set the tensor_parallel_size
argument to the number of GPUs available when initializing the model. For instance to run inference on 2 GPUs:
from outlines import models\n\nmodel = models.vllm(\n \"microsoft/Phi-3-mini-4k-instruct\"\n tensor_parallel_size=2\n)\n
"},{"location":"reference/models/vllm/#load-lora-adapters","title":"Load LoRA adapters","text":"You can load LoRA adapters and alternate between them dynamically:
from outlines import models\n\nmodel = models.vllm(\"facebook/opt-350m\", enable_lora=True)\nmodel.load_lora(\"ybelkaa/opt-350m-lora\") # Load LoRA adapter\nmodel.load_lora(None) # Unload LoRA adapter\n
"},{"location":"reference/models/vllm/#generate-text","title":"Generate text","text":"In addition to the parameters described in the text generation section you can pass an instance of SamplingParams
directly to any generator via the sampling_params
keyword argument:
from vllm.sampling_params import SamplingParams\nfrom outlines import models, generate\n\n\nmodel = models.vllm(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\n\nparams = SamplingParams(n=2, frequency_penalty=1., min_tokens=2)\nanswer = generator(\"A prompt\", sampling_params=params)\n
This also works with generators built with generate.regex
, generate.json
, generate.cfg
, generate.format
and generate.choice
.
Note
The values passed via the SamplingParams
instance supersede the other arguments to the generator or the samplers.
SamplingParams
attributes:
Parameters Type Description Default n
int
Number of output sequences to return for the given prompt. 1
best_of
Optional[int]
Number of output sequences that are generated from the prompt. From these best_of
sequences, the top n
sequences are returned. best_of
must be greater than or equal to n
. This is treated as the beam width when use_beam_search
is True. By default, best_of
is set to n
. None
presence_penalty
float
Float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens. 0.0
frequency_penalty
float
Float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens. 0.0
repetition_penalty
float
Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens. 1.0
temperature
float
Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling. 1.0
top_p
float
Float that controls the cumulative probability of the top tokens to consider. Must be in (0, 1]. Set to 1 to consider all tokens. 1.0
top_k
int
Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens. -1
min_p
float
Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this. 0.0
seed
Optional[int]
Random seed to use for the generation. None
use_beam_search
bool
Whether to use beam search instead of sampling. False
length_penalty
float
Float that penalizes sequences based on their length. Used in beam search. 1.0
early_stopping
Union[bool, str]
Controls the stopping condition for beam search. It accepts the following values: True
, where the generation stops as soon as there are best_of
complete candidates; False
, where an heuristic is applied and the generation stops when is it very unlikely to find better candidates; \"never\"
, where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm). False
stop
Optional[Union[str, List[str]]]
List of strings that stop the generation when they are generated. The returned output will not contain the stop strings. None
stop_token_ids
Optional[List[int]]
List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens. None
include_stop_str_in_output
bool
Whether to include the stop strings in output text. Defaults to False. False
ignore_eos
bool
Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. False
max_tokens
int
Maximum number of tokens to generate per output sequence. 16
min_tokens
int
Minimum number of tokens to generate per output sequence before EOS or stop_token_ids can be generated 0
skip_special_tokens
bool
Whether to skip special tokens in the output. True
spaces_between_special_tokens
bool
Whether to add spaces between special tokens in the output. Defaults to True. True
"},{"location":"reference/models/vllm/#streaming","title":"Streaming","text":"Warning
Streaming is not available for the offline vLLM integration.
"},{"location":"reference/models/vllm/#installation","title":"Installation","text":"By default the vLLM library is installed with pre-commpiled C++ and CUDA binaries and will only run on GPU:
pip install vllm\n
"},{"location":"reference/models/vllm/#cpu","title":"CPU","text":"You need to have the gcc
compiler installed on your system. Then you will need to install vLLM from source. First clone the repository:
git clone https://github.com/vllm-project/vllm.git\ncd vllm\n
Install the Python packages needed for the installation:
pip install --upgrade pip\npip install wheel packaging ninja setuptools>=49.4.0 numpy\npip install -v -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu\n
and finally run:
VLLM_TARGET_DEVICE=cpu python setup.py install\n
See the vLLM documentation for more details, alternative installation methods (Docker) and performance tips.
"},{"location":"reference/models/vllm/#rocm","title":"ROCm","text":"You will need to install vLLM from source. First install Pytorch on ROCm:
pip install torch==2.2.0.dev20231206+rocm5.7 --index-url https://download.pytorch.org/whl/nightly/rocm5.7 # tested version\n
You will then need to install flash attention for ROCm following these instructions. You can then install xformers=0.0.23
and apply the patches needed to adapt Flash Attention for ROCm:
pip install xformers==0.0.23 --no-deps\nbash patch_xformers.rocm.sh\n
And finally build vLLM:
cd vllm\npip install -U -r requirements-rocm.txt\npython setup.py install # This may take 5-10 minutes.\n
See the vLLM documentation for alternative installation methods (Docker).
"},{"location":"reference/serve/lmstudio/","title":"Serve with LM Studio","text":"Would rather not self-host?
If you want to get started quickly with JSON-structured generation you can call instead .json, a .txt API that guarantees valid JSON.
LM Studio is an application that runs local LLMs. It flexibly mixes GPU and CPU compute in hardware-constrained environments.
As of LM Studio 0.3.4, it natively supports Outlines for structured text generation, using an OpenAI-compatible endpoint.
"},{"location":"reference/serve/lmstudio/#setup","title":"Setup","text":" - Install LM Studio by visiting their downloads page.
- Enable the LM Studio server functionality.
- Download a model.
- Install Python dependencies.
pip install pydantic openai\n
"},{"location":"reference/serve/lmstudio/#calling-the-server","title":"Calling the server","text":"By default, LM Studio will serve from http://localhost:1234
. If you are serving on a different port or host, make sure to change the base_url
argument in OpenAI
to the relevant location.
class Testing(BaseModel):\n \"\"\"\n A class representing a testing schema.\n \"\"\"\n name: str\n age: int\n\nopenai_client = openai.OpenAI(\n base_url=\"http://0.0.0.0:1234/v1\",\n api_key=\"dopeness\"\n)\n\n# Make a request to the local LM Studio server\nresponse = openai_client.beta.chat.completions.parse(\n model=\"hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF\",\n messages=[\n {\"role\": \"system\", \"content\": \"You are like so good at whatever you do.\"},\n {\"role\": \"user\", \"content\": \"My name is Cameron and I am 28 years old. What's my name and age?\"}\n ],\n response_format=Testing\n)\n
You should receive a ParsedChatCompletion[Testing]
object back:
ParsedChatCompletion[Testing](\n id='chatcmpl-3hykyf0fxus7jc90k6gwlw',\n choices=[\n ParsedChoice[Testing](\n finish_reason='stop',\n index=0,\n logprobs=None,\n message=ParsedChatCompletionMessage[Testing](\n content='{ \"age\": 28, \"name\": \"Cameron\" }',\n refusal=None,\n role='assistant',\n function_call=None,\n tool_calls=[],\n parsed=Testing(name='Cameron', age=28)\n )\n )\n ],\n created=1728595622,\n model='lmstudio-community/Phi-3.1-mini-128k-instruct-GGUF/Phi-3.1-mini-128k-instruct-Q4_K_M.gguf',\n object='chat.completion',\n service_tier=None,\n system_fingerprint='lmstudio-community/Phi-3.1-mini-128k-instruct-GGUF/Phi-3.1-mini-128k-instruct-\nQ4_K_M.gguf',\n usage=CompletionUsage(\n completion_tokens=17,\n prompt_tokens=47,\n total_tokens=64,\n completion_tokens_details=None,\n prompt_tokens_details=None\n )\n)\n
You can retrieve your Testing
object with
response.choices[0].message.parsed\n
"},{"location":"reference/serve/vllm/","title":"Serve with vLLM","text":"Would rather not self-host?
If you want to get started quickly with JSON-structured generation you can call instead .json, a .txt API that guarantees valid JSON.
Outlines can be deployed as an LLM service using the vLLM inference engine and a FastAPI server. vLLM is not installed by default so will need to install Outlines with:
pip install outlines[serve]\n
You can then start the server with:
python -m outlines.serve.serve --model=\"microsoft/Phi-3-mini-4k-instruct\"\n
This will by default start a server at http://127.0.0.1:8000
(check what the console says, though). Without the --model
argument set, the OPT-125M model is used. The --model
argument allows you to specify any model of your choosing.
To run inference on multiple GPUs you must pass the --tensor-parallel-size
argument when initializing the server. For instance, to run inference on 2 GPUs:
python -m outlines.serve.serve --model=\"microsoft/Phi-3-mini-4k-instruct\" --tensor-parallel-size 2\n
"},{"location":"reference/serve/vllm/#alternative-method-via-docker","title":"Alternative Method: Via Docker","text":"You can install and run the server with Outlines' official Docker image using the command
docker run -p 8000:8000 outlinesdev/outlines --model=\"microsoft/Phi-3-mini-4k-instruct\"\n
"},{"location":"reference/serve/vllm/#querying-endpoint","title":"Querying Endpoint","text":"You can then query the model in shell by passing a prompt and either
- a JSON Schema specification or
- a Regex pattern
with the schema
or regex
parameters, respectively, to the /generate
endpoint. If both are specified, the schema will be used. If neither is specified, the generated text will be unconstrained.
For example, to generate a string that matches the schema {\"type\": \"string\"}
(any string):
curl http://127.0.0.1:8000/generate \\\n -d '{\n \"prompt\": \"What is the capital of France?\",\n \"schema\": {\"type\": \"string\", \"maxLength\": 5}\n }'\n
To generate a string that matches the regex (-)?(0|[1-9][0-9]*)(\\.[0-9]+)?([eE][+-][0-9]+)?
(a number):
curl http://127.0.0.1:8000/generate \\\n -d '{\n \"prompt\": \"What is Pi? Give me the first 15 digits: \",\n \"regex\": \"(-)?(0|[1-9][0-9]*)(\\\\.[0-9]+)?([eE][+-][0-9]+)?\"\n }'\n
Instead of curl
, you can also use the requests library from another python program.
Please consult the vLLM documentation for details on additional request parameters. You can also read the code in case you need to customize the solution to your needs.
"},{"location":"blog/archive/2024/","title":"2024","text":""},{"location":"blog/category/roadmap/","title":"Roadmap","text":""}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"installation/","title":"Installation","text":"You can install Outlines with pip
:
pip install outlines\n
Outlines supports OpenAI, transformers, Mamba, llama.cpp and exllama2 but you will need to install them manually:
pip install openai\npip install transformers datasets accelerate torch\npip install llama-cpp-python\npip install exllamav2 transformers torch\npip install mamba_ssm transformers torch\npip install vllm\n
If you encounter any problem using Outlines with these libraries, take a look at their installation instructions. The installation of openai
and transformers
should be straightforward, but other libraries have specific hardware requirements.
"},{"location":"installation/#bleeding-edge","title":"Bleeding edge","text":"You can install the latest version of Outlines on the repository's main
branch:
pip install git+https://github.com/dottxt-ai/outlines.git@main\n
This can be useful, for instance, when a fix has been merged but not yet released.
"},{"location":"installation/#installing-for-development","title":"Installing for development","text":"See the contributing documentation for instructions on how to install Outlines for development.
"},{"location":"licence/","title":"Licence and citations","text":"Outlines is licenced under the Apache 2.0 licence. To comply with the licence you need to add the following notice at the top every file that uses part of Outlines' code:
Copyright 2023- The Outlines developers\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n
If you use Outlines in your work you can use the following citation:
@article{willard2023efficient,\n title={Efficient Guided Generation for LLMs},\n author={Willard, Brandon T and Louf, R{\\'e}mi},\n journal={arXiv preprint arXiv:2307.09702},\n year={2023}\n}\n
"},{"location":"quickstart/","title":"Quickstart","text":"After installing Outlines, the fastest way to get to up to speed with the library is to get acquainted with its few core elements. We advise you to take a quick look at this page to see everything Outlines has to offer before diving in the documentation.
"},{"location":"quickstart/#core-elements","title":"Core elements","text":""},{"location":"quickstart/#models","title":"Models","text":"The first step when writing a program with Outlines is to initialize a model. Weights will be loaded on the device at this step:
import outlines\n\nmodel = outlines.models.transformers(\n \"microsoft/Phi-3-mini-4k-instruct\",\n device=\"cuda\" # optional device argument, default is cpu\n)\n
Outlines supports a wide variety of inference engines and model weight types. More details on different models can be found in the Outlines Models documentation page.
"},{"location":"quickstart/#generation","title":"Generation","text":"Once the model is initialized you can build an outlines.generate
generator. This generator can be called with a prompt directly.
(Outlines Structured Generation Full Documentation)
TextStructured generator = outlines.generate.text(model)\n\nresult = generator(\"Question: What's 2+2? Answer:\", max_tokens=100)\nprint(result)\n# The answer is 4\n\n# Outlines also supports streaming output\nstream = generator.stream(\"What's 2+2?\", max_tokens=4)\nfor i in range(5):\n token = next(stream)\n print(repr(token))\n# '2'\n# '+'\n# '2'\n# ' equals'\n# '4'\n
Along with typical language model generation behavior via, outlines.generate.text
, Outlines supports structured generation, which guarantees the tokens generated by the model will follow a predefined structure. Structures can be defined by a regex pattern, JSON schema, python object type, or a Lark grammar defining a parsable language such as SQL or Python.
Example: using pydantic to enforce a JSON schema
from enum import Enum\nfrom pydantic import BaseModel, constr, conint\n\nclass Character(BaseModel):\n name: constr(max_length=10)\n age: conint(gt=18, lt=99)\n armor: (Enum('Armor', {'leather': 'leather', 'chainmail': 'chainmail', 'plate': 'plate'}))\n strength: conint(gt=1, lt=100)\n\ngenerator = outlines.generate.json(model, Character)\n\ncharacter = generator(\n \"Generate a new character for my awesome game: \"\n + \"name, age (between 1 and 99), armor and strength. \"\n )\nprint(character)\n# Character(name='Zara', age=25, armor=<Armor.leather: 'leather'>, strength=85)\n
"},{"location":"quickstart/#deploy-using-vllm-and-fastapi","title":"Deploy using vLLM and FastAPI","text":"Outlines can be deployed as a LLM service using vLLM and FastAPI. The server supports asynchronous processing of incoming requests, and benefits from the performance of vLLM.
First start the server:
python -m outlines.serve.serve --model=\"microsoft/Phi-3-mini-4k-instruct\"\n
Or you can start the server with Outlines' official Docker image:
docker run -p 8000:8000 outlinesdev/outlines --model=\"microsoft/Phi-3-mini-4k-instruct\"\n
This will by default start a server at http://127.0.0.1:8000
(check what the console says, though). Without the --model
argument set, the OPT-125M model is used.
You can then query the model in shell by passing a prompt and a JSON Schema specification for the structure of the output:
curl http://127.0.0.1:8000/generate \\\n -d '{\n \"prompt\": \"Question: What is a language model? Answer:\",\n \"schema\": {\"type\": \"string\"}\n }'\n
Or use the requests library from another python program. You can read the vLLM documentation for more details.
"},{"location":"quickstart/#utilities","title":"Utilities","text":""},{"location":"quickstart/#prompt-templates","title":"Prompt templates","text":"Prompting can lead to messy code. Outlines' prompt functions are python functions that contain a template for the prompt in their docstring. We use a powerful templating language to allow you to loop over lists, dictionaries, add conditionals, etc. directly from the prompt. When called, a prompt function returns the rendered template:
import outlines\n\n@outlines.prompt\ndef few_shots(instructions, examples, question):\n \"\"\"{{ instructions }}\n\n Examples\n --------\n\n {% for example in examples %}\n Q: {{ example.question }}\n A: {{ example.answer }}\n\n {% endfor %}\n Question\n --------\n\n Q: {{ question }}\n A:\n \"\"\"\n\ninstructions = \"Please answer the following question following the examples\"\nexamples = [\n {\"question\": \"2+2=?\", \"answer\":4},\n {\"question\": \"3+3=?\", \"answer\":6}\n]\nquestion = \"4+4 = ?\"\n\nprompt = few_shots(instructions, examples, question)\nprint(prompt)\n# Please answer the following question following the examples\n\n# Examples\n# --------\n\n# Q: 2+2=?\n# A: 4\n\n# Q: 3+3=?\n# A: 6\n\n# Question\n# --------\n\n# Q: 4+4 = ?\n# A:\n
"},{"location":"quickstart/#outlines-functions","title":"Outlines functions","text":"Once you are done experimenting with a prompt and an output structure, it is useful to be able to encapsulate all of these in a single function that can be called from other parts of the program. This is what outlines.Function
allows you to do:
function.pyCall a functionCall a function stored on GitHub from pydantic import BaseModel\n\nimport outlines\n\n\n@outlines.prompt\ndef tell_a_joke(topic):\n \"\"\"Tell me a joke about {{ topic }}.\"\"\"\n\nclass Joke(BaseModel):\n setup: str\n punchline: str\n\ngenerate_joke = outlines.Function(\n tell_a_joke,\n Joke,\n \"microsoft/Phi-3-mini-4k-instruct\"\n)\n
from .function import generate_joke\n\nresponse = generate_joke(\"baseball\")\n\n# haha\n# Joke(setup='Why was the baseball in a bad mood?', punchline='Because it got hit around a lot.')\n
You can load a function that is stored on a repository on GitHub directly from Outlines. Say Someone
stores a function in joke.py
at the root of the TheirRepo
repository:
import outlines\n\njoke = outlines.Function.from_github(\"Someone/TheirRepo/joke\")\nresponse = joke(\"baseball\")\n
It make it easier for the community to collaborate on the infinite number of use cases enabled by these models!"},{"location":"quickstart/#going-further","title":"Going further","text":"If you need more inspiration you can take a look at the cookbook or watch Remi Louf's AI Engineer World\u2019s Fair Presentation on Outlines. If you have any question, or requests for documentation please reach out to us on GitHub, Twitter or Discord.
"},{"location":"welcome/","title":"Welcome to Outlines!","text":"Outlines is a Python library that allows you to use Large Language Model in a simple and robust way (with structured generation). It is built by .txt, and is already used in production by many companies.
"},{"location":"welcome/#what-models-do-you-support","title":"What models do you support?","text":"We support Openai, but the true power of Outlines is unleashed with Open Source models available via the transformers, llama.cpp, exllama2, mlx-lm and vllm models. If you want to build and maintain an integration with another library, get in touch.
"},{"location":"welcome/#what-are-the-main-features","title":"What are the main features?","text":" -
Make LLMs generate valid JSON
No more invalid JSON outputs, 100% guaranteed
Generate JSON
-
JSON mode for vLLM
Deploy a LLM service using Outlines' JSON structured generation and vLLM
Deploy outlines
-
Make LLMs follow a Regex
Generate text that parses correctly 100% of the time
Guide LLMs
-
Powerful Prompt Templating
Better manage your prompts' complexity with prompt templating
Learn more
"},{"location":"welcome/#why-use-outlines","title":"Why use Outlines?","text":"Outlines is built at .txt by engineers with decades of experience in software engineering, machine learning (Bayesian Statistics and NLP), and compilers. .txt is a VC-backed company fully focused on the topic of structured generation and is committed to make the community benefit from its experience.
We are also open source veterans and have authored/maintained many libraries over the years: the Aesara and Pythological ecosystems, Blackjax and Hy among many others. .
Outlines does not use unnecessary abstractions that tend to get in your way. We have a laser focus on reliable text generation with LLMs, a clear roadmap to push the state of the art in this area and a commitment to clean and robust code.
And last but not least, unlike alternatives, Outlines' structured generation introduces no overhead during inference.
"},{"location":"welcome/#who-is-using-outlines","title":"Who is using Outlines?","text":"Hundreds of organisations and the main LLM serving frameworks (vLLM, TGI, LoRAX, xinference, SGLang) are using Outlines. Some of the prominent companies and organizations that are using Outlines include:
Organizations are included either because they use Outlines as a dependency in a public repository, or because of direct communication between members of the Outlines team and employees at these organizations.
Still not convinced, read what people say about us. And make sure to take a look at what the community is building!
"},{"location":"welcome/#philosophy","title":"Philosophy","text":"Outlines is a library for neural text generation. You can think of it as a more flexible replacement for the generate
method in the transformers library.
Outlines helps developers structure text generation to build robust interfaces with external systems. It provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema.
Outlines provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc.
Outlines is designed as a library that is meant to be compatible the broader ecosystem, not to replace it. We use as few abstractions as possible, and generation can be interleaved with control flow, conditionals, custom Python functions and calls to other libraries.
Outlines is compatible with every auto-regressive model. It only interfaces with models via the next-token logits distribution.
"},{"location":"welcome/#outlines-people","title":"Outlines people","text":"Outlines would not be what it is today without a community of dedicated developers:
"},{"location":"welcome/#acknowledgements","title":"Acknowledgements","text":"Outlines was originally developed at @NormalComputing by @remilouf and @BrandonTWillard. It is now maintained by .txt.
"},{"location":"api/","title":"API Reference","text":""},{"location":"api/guide/","title":"Guide","text":""},{"location":"api/guide/#outlines.fsm.guide.CFGGuide","title":"CFGGuide
","text":" Bases: Guide
Guide to generate text that is in the language of a context-free Lark grammar.
Source code in outlines/fsm/guide.py
class CFGGuide(Guide):\n \"\"\"Guide to generate text that is in the language of a context-free Lark grammar.\"\"\"\n\n def __init__(self, cfg_string: str, tokenizer):\n \"\"\"\n Construct the PartialLark parser and set the empty initial_state (PartialParserState)\n \"\"\"\n warnings.warn(\n \"Outlines' public *community-contributed* CFG structured generation is experimental. \"\n \"Please review https://dottxt-ai.github.io/outlines/latest/reference/generation/cfg#disclaimer\"\n )\n\n self.cfg_string = cfg_string\n self.tokenizer = tokenizer\n self.eos_token_id = self.tokenizer.eos_token_id\n self.parser = PartialLark(\n cfg_string,\n parser=\"lalr\",\n import_paths=[grammars.GRAMMAR_PATH],\n )\n self.initial_state = CFGState(\n parser_state=self.parser.parse(\"\"), prev_token=None\n )\n\n def get_next_instruction(self, state: CFGState) -> Instruction:\n \"\"\"Return the next instruction for guided generation.\n\n Current lazy approach:\n - For each token in the vocabulary\n - create a copy of the parsers state\n - add the tokens to the parsers input text\n - if valid, add token to returned tokens\n\n Further refinements are necessary for performant text processing.\n\n Parameters\n ----------\n state\n The guides current PartialParserState, or None if complete\n\n Returns\n -------\n A `Generate` instance that contains the model and the allowed token ids.\n\n \"\"\"\n\n if state.parser_state is None:\n return Write(torch.tensor([self.eos_token_id]))\n\n valid_tokens = list(\n self.iter_valid_token_ids(state, self.tokenizer.vocabulary.values())\n )\n if len(valid_tokens) == 1:\n return Write(torch.tensor(valid_tokens))\n return Generate(torch.tensor(valid_tokens))\n\n def iter_valid_token_ids(\n self, state: CFGState, candidate_token_ids: list\n ) -> Generator[int, None, None]:\n \"\"\"\n Iterate over the given token_ids and yield those that are valid for the current parser state.\n\n Parameters\n ----------\n parser_state\n The current state of the parser, or None if complete.\n token_ids\n The list of token ids to check for validity.\n\n Yields\n ------\n int\n Valid token ids.\n \"\"\"\n if state.parser_state is None:\n yield self.eos_token_id\n return\n\n for token_id in candidate_token_ids:\n if token_id == self.eos_token_id:\n if self.can_terminate_state(state):\n yield token_id\n else:\n try:\n self._get_parser_state_token_applied(state, int(token_id))\n yield token_id\n except (\n ValueError,\n EOFError,\n UnexpectedToken,\n UnexpectedCharacters,\n DedentError,\n ):\n pass\n\n def get_next_state(self, state: CFGState, token_id: int) -> CFGState:\n \"\"\"\n Update the state of the guide.\n Decode the token_id, and calculate the new parser_state with the token applied.\n\n Parameters\n ----------\n state\n The guides current PartialParserState, or None if complete\n token_id\n The id of the token that was just generated.\n\n Returns\n -------\n The guides new PartialParserState\n\n \"\"\"\n if state.parser_state is None or token_id == self.eos_token_id:\n parser_state = None\n else:\n parser_state = self._get_parser_state_token_applied(state, int(token_id))\n return CFGState(parser_state=parser_state, prev_token=token_id)\n\n def _get_parser_state_token_applied(\n self, state: CFGState, token_id: int\n ) -> PartialParserState:\n \"\"\"\n Don't mutate `parser_state`, copy to protect\n\n Get the token string\n - if first token in generation: tokenizer.decode (no leading whitespace)\n - else: normalized (with possibly leading whitespace)\n\n Don't allow empty (\"\") tokens, raise ValueError\n \"\"\"\n parser_state = copy.copy(state.parser_state) # prevent side effects\n\n # normalize\n if state.prev_token is None:\n new_token_str = self.tokenizer.decode([token_id])[0]\n else:\n prev_token_str = self.tokenizer.decode([[state.prev_token]])[0]\n combined_token_str = self.tokenizer.decode([[state.prev_token, token_id]])[\n 0\n ]\n new_token_str = combined_token_str[len(prev_token_str) :]\n\n if new_token_str == \"\":\n raise ValueError(\"empty next token\")\n\n # update parser with new token\n parser_state.lexer.state.text += new_token_str\n self.parser.parse_from_state(parser_state, is_end=False)\n\n return parser_state\n\n def is_final_state(self, state: CFGState) -> bool:\n # TODO: remove this method, use can_terminate_state and must_terminate_state\n # here and in RegexGuide per https://github.com/dottxt-ai/outlines/issues/885\n return self.can_terminate_state(state)\n\n def can_terminate_state(self, state: CFGState) -> bool:\n \"\"\"Generation is allowed to terminate\"\"\"\n if state.parser_state is not None:\n try:\n copy.copy(state.parser_state).feed_eof()\n except UnexpectedToken:\n return False\n return True\n\n def must_terminate_state(self, state: CFGState) -> bool:\n \"\"\"Generation must terminate, no legal continuations\"\"\"\n return state.parser_state is None or set(state.parser_state.accepts()).issubset(\n {\"$END\"}\n )\n\n def copy(self) -> \"CFGGuide\":\n \"\"\"Create a copy of the Guide.\"\"\"\n return CFGGuide(self.cfg_string, self.tokenizer)\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.__init__","title":"__init__(cfg_string, tokenizer)
","text":"Construct the PartialLark parser and set the empty initial_state (PartialParserState)
Source code in outlines/fsm/guide.py
def __init__(self, cfg_string: str, tokenizer):\n \"\"\"\n Construct the PartialLark parser and set the empty initial_state (PartialParserState)\n \"\"\"\n warnings.warn(\n \"Outlines' public *community-contributed* CFG structured generation is experimental. \"\n \"Please review https://dottxt-ai.github.io/outlines/latest/reference/generation/cfg#disclaimer\"\n )\n\n self.cfg_string = cfg_string\n self.tokenizer = tokenizer\n self.eos_token_id = self.tokenizer.eos_token_id\n self.parser = PartialLark(\n cfg_string,\n parser=\"lalr\",\n import_paths=[grammars.GRAMMAR_PATH],\n )\n self.initial_state = CFGState(\n parser_state=self.parser.parse(\"\"), prev_token=None\n )\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.can_terminate_state","title":"can_terminate_state(state)
","text":"Generation is allowed to terminate
Source code in outlines/fsm/guide.py
def can_terminate_state(self, state: CFGState) -> bool:\n \"\"\"Generation is allowed to terminate\"\"\"\n if state.parser_state is not None:\n try:\n copy.copy(state.parser_state).feed_eof()\n except UnexpectedToken:\n return False\n return True\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.copy","title":"copy()
","text":"Create a copy of the Guide.
Source code in outlines/fsm/guide.py
def copy(self) -> \"CFGGuide\":\n \"\"\"Create a copy of the Guide.\"\"\"\n return CFGGuide(self.cfg_string, self.tokenizer)\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_instruction","title":"get_next_instruction(state)
","text":"Return the next instruction for guided generation.
Current lazy approach: - For each token in the vocabulary - create a copy of the parsers state - add the tokens to the parsers input text - if valid, add token to returned tokens
Further refinements are necessary for performant text processing.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_instruction--parameters","title":"Parameters","text":"state The guides current PartialParserState, or None if complete
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_instruction--returns","title":"Returns","text":"A Generate
instance that contains the model and the allowed token ids.
Source code in outlines/fsm/guide.py
def get_next_instruction(self, state: CFGState) -> Instruction:\n \"\"\"Return the next instruction for guided generation.\n\n Current lazy approach:\n - For each token in the vocabulary\n - create a copy of the parsers state\n - add the tokens to the parsers input text\n - if valid, add token to returned tokens\n\n Further refinements are necessary for performant text processing.\n\n Parameters\n ----------\n state\n The guides current PartialParserState, or None if complete\n\n Returns\n -------\n A `Generate` instance that contains the model and the allowed token ids.\n\n \"\"\"\n\n if state.parser_state is None:\n return Write(torch.tensor([self.eos_token_id]))\n\n valid_tokens = list(\n self.iter_valid_token_ids(state, self.tokenizer.vocabulary.values())\n )\n if len(valid_tokens) == 1:\n return Write(torch.tensor(valid_tokens))\n return Generate(torch.tensor(valid_tokens))\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_state","title":"get_next_state(state, token_id)
","text":"Update the state of the guide. Decode the token_id, and calculate the new parser_state with the token applied.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_state--parameters","title":"Parameters","text":"state The guides current PartialParserState, or None if complete token_id The id of the token that was just generated.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.get_next_state--returns","title":"Returns","text":"The guides new PartialParserState
Source code in outlines/fsm/guide.py
def get_next_state(self, state: CFGState, token_id: int) -> CFGState:\n \"\"\"\n Update the state of the guide.\n Decode the token_id, and calculate the new parser_state with the token applied.\n\n Parameters\n ----------\n state\n The guides current PartialParserState, or None if complete\n token_id\n The id of the token that was just generated.\n\n Returns\n -------\n The guides new PartialParserState\n\n \"\"\"\n if state.parser_state is None or token_id == self.eos_token_id:\n parser_state = None\n else:\n parser_state = self._get_parser_state_token_applied(state, int(token_id))\n return CFGState(parser_state=parser_state, prev_token=token_id)\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.iter_valid_token_ids","title":"iter_valid_token_ids(state, candidate_token_ids)
","text":"Iterate over the given token_ids and yield those that are valid for the current parser state.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.iter_valid_token_ids--parameters","title":"Parameters","text":"parser_state The current state of the parser, or None if complete. token_ids The list of token ids to check for validity.
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.iter_valid_token_ids--yields","title":"Yields","text":"int Valid token ids.
Source code in outlines/fsm/guide.py
def iter_valid_token_ids(\n self, state: CFGState, candidate_token_ids: list\n) -> Generator[int, None, None]:\n \"\"\"\n Iterate over the given token_ids and yield those that are valid for the current parser state.\n\n Parameters\n ----------\n parser_state\n The current state of the parser, or None if complete.\n token_ids\n The list of token ids to check for validity.\n\n Yields\n ------\n int\n Valid token ids.\n \"\"\"\n if state.parser_state is None:\n yield self.eos_token_id\n return\n\n for token_id in candidate_token_ids:\n if token_id == self.eos_token_id:\n if self.can_terminate_state(state):\n yield token_id\n else:\n try:\n self._get_parser_state_token_applied(state, int(token_id))\n yield token_id\n except (\n ValueError,\n EOFError,\n UnexpectedToken,\n UnexpectedCharacters,\n DedentError,\n ):\n pass\n
"},{"location":"api/guide/#outlines.fsm.guide.CFGGuide.must_terminate_state","title":"must_terminate_state(state)
","text":"Generation must terminate, no legal continuations
Source code in outlines/fsm/guide.py
def must_terminate_state(self, state: CFGState) -> bool:\n \"\"\"Generation must terminate, no legal continuations\"\"\"\n return state.parser_state is None or set(state.parser_state.accepts()).issubset(\n {\"$END\"}\n )\n
"},{"location":"api/guide/#outlines.fsm.guide.Guide","title":"Guide
","text":" Bases: Guide
Base definition of a generation guide.
A generation guide defines the behavior of a finite-state machine that guides a text generation procedure. Unlike the DFAs built from regular expressions guides can also emit a Write
instructions which tells the model that it can append a sequence of tokens (or token word) instead of generating it.
Source code in outlines/fsm/guide.py
class Guide(CoreGuide):\n \"\"\"Base definition of a generation guide.\n\n A generation guide defines the behavior of a finite-state machine that guides\n a text generation procedure. Unlike the DFAs built from regular expressions\n guides can also emit a `Write` instructions which tells the model that it can\n append a sequence of tokens (or token word) instead of generating it.\n\n \"\"\"\n\n initial_state: Any\n
"},{"location":"api/guide/#outlines.fsm.guide.RegexGuide","title":"RegexGuide
","text":" Bases: RegexGuide
Guide to generate text in the language of a regular expression. CoreRegexGuide with outlines cache
Source code in outlines/fsm/guide.py
class RegexGuide(CoreRegexGuide):\n \"\"\"\n Guide to generate text in the language of a regular expression.\n CoreRegexGuide with outlines cache\n \"\"\"\n\n @classmethod\n def from_regex(\n cls,\n regex_string: str,\n tokenizer,\n **kwargs,\n ):\n return super().from_regex(\n regex_string,\n tokenizer,\n _create_states_mapping=cached_create_states_mapping,\n **kwargs,\n )\n
"},{"location":"api/guide/#outlines.fsm.guide.StopAtEOSGuide","title":"StopAtEOSGuide
","text":" Bases: Guide
Guide to generate tokens until the EOS token has been generated.
Source code in outlines/fsm/guide.py
class StopAtEOSGuide(Guide):\n \"\"\"Guide to generate tokens until the EOS token has been generated.\"\"\"\n\n final_state = 1\n start_state = 0 # TODO: remove start_state, use only initial_state\n initial_state = 0\n\n def __init__(self, tokenizer: \"Tokenizer\"):\n \"\"\"Initialize the generation guide.\n\n model\n The logit generator used to generate the next token.\n\n \"\"\"\n self.eos_token_id = tokenizer.eos_token_id\n self.vocabulary = tokenizer.vocabulary.values()\n\n def get_next_instruction(self, state: int) -> Instruction:\n if self.is_final_state(state):\n return Write([self.eos_token_id])\n return Generate(None)\n\n def get_next_state(self, state: int, token_id: int) -> int:\n if token_id == self.eos_token_id or state == self.final_state:\n return self.final_state\n\n return self.initial_state\n\n def is_final_state(self, state: int):\n return state == self.final_state\n\n def copy(self):\n return self\n
"},{"location":"api/guide/#outlines.fsm.guide.StopAtEOSGuide.__init__","title":"__init__(tokenizer)
","text":"Initialize the generation guide.
model The logit generator used to generate the next token.
Source code in outlines/fsm/guide.py
def __init__(self, tokenizer: \"Tokenizer\"):\n \"\"\"Initialize the generation guide.\n\n model\n The logit generator used to generate the next token.\n\n \"\"\"\n self.eos_token_id = tokenizer.eos_token_id\n self.vocabulary = tokenizer.vocabulary.values()\n
"},{"location":"api/json_schema/","title":"Json schema","text":""},{"location":"api/json_schema/#outlines.fsm.json_schema.convert_json_schema_to_str","title":"convert_json_schema_to_str(json_schema)
","text":"Convert a JSON schema to a string.
"},{"location":"api/json_schema/#outlines.fsm.json_schema.convert_json_schema_to_str--parameters","title":"Parameters","text":"json_schema The JSON schema.
"},{"location":"api/json_schema/#outlines.fsm.json_schema.convert_json_schema_to_str--returns","title":"Returns","text":"str The JSON schema converted to a string.
"},{"location":"api/json_schema/#outlines.fsm.json_schema.convert_json_schema_to_str--raises","title":"Raises","text":"ValueError If the schema is not a dictionary, a string or a Pydantic class.
Source code in outlines/fsm/json_schema.py
def convert_json_schema_to_str(json_schema: Union[dict, str, Type[BaseModel]]) -> str:\n \"\"\"Convert a JSON schema to a string.\n\n Parameters\n ----------\n json_schema\n The JSON schema.\n\n Returns\n -------\n str\n The JSON schema converted to a string.\n\n Raises\n ------\n ValueError\n If the schema is not a dictionary, a string or a Pydantic class.\n \"\"\"\n if isinstance(json_schema, dict):\n schema_str = json.dumps(json_schema)\n elif isinstance(json_schema, str):\n schema_str = json_schema\n elif issubclass(json_schema, BaseModel):\n schema_str = json.dumps(json_schema.model_json_schema())\n else:\n raise ValueError(\n f\"Cannot parse schema {json_schema}. The schema must be either \"\n + \"a Pydantic class, a dictionary or a string that contains the JSON \"\n + \"schema specification\"\n )\n return schema_str\n
"},{"location":"api/json_schema/#outlines.fsm.json_schema.get_schema_from_signature","title":"get_schema_from_signature(fn)
","text":"Turn a function signature into a JSON schema.
Every JSON object valid to the output JSON Schema can be passed to fn
using the ** unpacking syntax.
Source code in outlines/fsm/json_schema.py
def get_schema_from_signature(fn: Callable) -> dict:\n \"\"\"Turn a function signature into a JSON schema.\n\n Every JSON object valid to the output JSON Schema can be passed\n to `fn` using the ** unpacking syntax.\n\n \"\"\"\n signature = inspect.signature(fn)\n arguments = {}\n for name, arg in signature.parameters.items():\n if arg.annotation == inspect._empty:\n raise ValueError(\"Each argument must have a type annotation\")\n else:\n arguments[name] = (arg.annotation, ...)\n\n try:\n fn_name = fn.__name__\n except Exception as e:\n fn_name = \"Arguments\"\n warnings.warn(\n f\"The function name could not be determined. Using default name 'Arguments' instead. For debugging, here is exact error:\\n{e}\",\n category=UserWarning,\n )\n model = create_model(fn_name, **arguments)\n\n return model.model_json_schema()\n
"},{"location":"api/models/","title":"Models","text":"Integration with OpenAI's API.
"},{"location":"api/models/#outlines.models.transformers.TransformerTokenizer","title":"TransformerTokenizer
","text":" Bases: Tokenizer
Represents a tokenizer for models in the transformers
library.
Source code in outlines/models/transformers.py
class TransformerTokenizer(Tokenizer):\n \"\"\"Represents a tokenizer for models in the `transformers` library.\"\"\"\n\n def __init__(self, tokenizer: \"PreTrainedTokenizer\", **kwargs):\n self.tokenizer = tokenizer\n self.eos_token_id = self.tokenizer.eos_token_id\n self.eos_token = self.tokenizer.eos_token\n\n if self.tokenizer.pad_token_id is None:\n self.tokenizer.pad_token_id = self.tokenizer.eos_token_id\n self.pad_token_id = self.eos_token_id\n else:\n self.pad_token_id = self.tokenizer.pad_token_id\n self.pad_token = self.tokenizer.pad_token\n\n self.special_tokens = set(self.tokenizer.all_special_tokens)\n\n self.vocabulary = self.tokenizer.get_vocab()\n self.is_llama = isinstance(self.tokenizer, get_llama_tokenizer_types())\n\n def encode(\n self, prompt: Union[str, List[str]], **kwargs\n ) -> Tuple[\"torch.LongTensor\", \"torch.LongTensor\"]:\n kwargs[\"padding\"] = True\n kwargs[\"return_tensors\"] = \"pt\"\n output = self.tokenizer(prompt, **kwargs)\n return output[\"input_ids\"], output[\"attention_mask\"]\n\n def decode(self, token_ids: \"torch.LongTensor\") -> List[str]:\n text = self.tokenizer.batch_decode(token_ids, skip_special_tokens=True)\n return text\n\n def convert_token_to_string(self, token: str) -> str:\n from transformers.file_utils import SPIECE_UNDERLINE\n\n string = self.tokenizer.convert_tokens_to_string([token])\n\n if self.is_llama:\n # A hack to handle missing spaces to HF's Llama tokenizers\n if token.startswith(SPIECE_UNDERLINE) or token == \"<0x20>\":\n return \" \" + string\n\n return string\n\n def __eq__(self, other):\n if isinstance(other, type(self)):\n if hasattr(self, \"model_name\") and hasattr(self, \"kwargs\"):\n return (\n other.model_name == self.model_name and other.kwargs == self.kwargs\n )\n else:\n return other.tokenizer == self.tokenizer\n return NotImplemented\n\n def __hash__(self):\n from datasets.fingerprint import Hasher\n\n return hash(Hasher.hash(self.tokenizer))\n\n def __getstate__(self):\n state = {\"tokenizer\": self.tokenizer}\n return state\n\n def __setstate__(self, state):\n self.__init__(state[\"tokenizer\"])\n
"},{"location":"api/models/#outlines.models.transformers.Transformers","title":"Transformers
","text":"Represents a transformers
model.
Source code in outlines/models/transformers.py
class Transformers:\n \"\"\"Represents a `transformers` model.\"\"\"\n\n def __init__(\n self,\n model: \"PreTrainedModel\",\n tokenizer: \"PreTrainedTokenizer\",\n ):\n self.model = model\n self.tokenizer = TransformerTokenizer(tokenizer)\n\n def forward(\n self,\n input_ids: \"torch.LongTensor\",\n attention_mask: \"torch.LongTensor\",\n past_key_values: Optional[Tuple] = None,\n ) -> Tuple[\"torch.FloatTensor\", Optional[KVCacheType]]:\n \"\"\"Compute a forward pass through the transformer model.\n\n Parameters\n ----------\n input_ids\n The input token ids. Must be one or two dimensional.\n attention_mask\n The attention mask. Must be one or two dimensional.\n past_key_values\n A tuple of tuples containing the cached key and value tensors for each\n attention head.\n\n Returns\n -------\n The computed logits and the new cached key and value tensors.\n\n \"\"\"\n try:\n import torch\n except ImportError:\n ImportError(\n \"The `torch` library needs to be installed to use `transformers` models.\"\n )\n assert 0 < input_ids.ndim < 3\n\n if past_key_values:\n input_ids = input_ids[..., -1].unsqueeze(-1)\n\n with torch.inference_mode():\n output = self.model(\n input_ids,\n attention_mask=attention_mask,\n return_dict=True,\n output_attentions=False,\n output_hidden_states=False,\n past_key_values=past_key_values,\n )\n\n return output.logits, output.past_key_values\n\n def __call__(\n self,\n input_ids: \"torch.LongTensor\",\n attention_mask: \"torch.LongTensor\",\n past_key_values: Optional[Tuple] = None,\n ) -> \"torch.FloatTensor\":\n logits, kv_cache = self.forward(input_ids, attention_mask, past_key_values)\n next_token_logits = logits[..., -1, :]\n\n return next_token_logits, kv_cache\n\n def generate(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n ) -> Union[str, List[str], List[List[str]]]:\n \"\"\"Generate text using `transformers`.\n\n Arguments\n ---------\n prompts\n A prompt or list of prompts.\n generation_parameters\n An instance of `GenerationParameters` that contains the prompt,\n the maximum number of tokens, stop sequences and seed. All the\n arguments to `SequenceGeneratorAdapter`'s `__cal__` method.\n logits_processor\n The logits processor to use when generating text.\n sampling_parameters\n An instance of `SamplingParameters`, a dataclass that contains\n the name of the sampler to use and related parameters as available\n in Outlines.\n\n Returns\n -------\n The generated text\n \"\"\"\n if isinstance(prompts, str):\n # convert to 2d\n input_ids, attention_mask = self.tokenizer.encode([prompts])\n else:\n input_ids, attention_mask = self.tokenizer.encode(prompts)\n\n inputs = {\n \"input_ids\": input_ids.to(self.model.device),\n \"attention_mask\": attention_mask.to(self.model.device),\n }\n if (\n \"attention_mask\"\n not in inspect.signature(self.model.forward).parameters.keys()\n ):\n del inputs[\"attention_mask\"]\n\n generation_kwargs = self._get_generation_kwargs(\n prompts,\n generation_parameters,\n logits_processor,\n sampling_parameters,\n )\n generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)\n\n # if single str input and single sample per input, convert to a 1D output\n if isinstance(prompts, str):\n generated_ids = generated_ids.squeeze(0)\n\n return self._decode_generation(generated_ids)\n\n def stream(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n ) -> Iterator[Union[str, List[str]]]:\n \"\"\"\n Temporary stream stand-in which implements stream() signature\n and equivalent behaviour but isn't yielded until generation completes.\n\n TODO: implement following completion of https://github.com/huggingface/transformers/issues/30810\n \"\"\"\n if isinstance(prompts, str):\n # convert to 2d\n input_ids, attention_mask = self.tokenizer.encode([prompts])\n else:\n input_ids, attention_mask = self.tokenizer.encode(prompts)\n inputs = {\n \"input_ids\": input_ids.to(self.model.device),\n \"attention_mask\": attention_mask.to(self.model.device),\n }\n if (\n \"attention_mask\"\n not in inspect.signature(self.model.forward).parameters.keys()\n ):\n del inputs[\"attention_mask\"]\n\n generation_kwargs = self._get_generation_kwargs(\n prompts,\n generation_parameters,\n logits_processor,\n sampling_parameters,\n )\n generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)\n\n # if single str input and single sample per input, convert to a 1D output\n if isinstance(prompts, str):\n generated_ids = generated_ids.squeeze(0)\n\n for i in range(generated_ids.size(-1)):\n output_group_ids = generated_ids.select(-1, i).unsqueeze(-1)\n yield self._decode_generation(output_group_ids)\n\n def _get_generation_kwargs(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n ) -> dict:\n \"\"\"\n Conert outlines generation parameters into model.generate kwargs\n \"\"\"\n from transformers import GenerationConfig, LogitsProcessorList, set_seed\n\n max_new_tokens, stop_at, seed = dataclasses.astuple(generation_parameters)\n sampler, num_samples, top_p, top_k, temperature = dataclasses.astuple(\n sampling_parameters\n )\n if max_new_tokens is None:\n max_new_tokens = int(2**30)\n\n # global seed, not desirable\n if seed is not None:\n set_seed(seed)\n\n if logits_processor is not None:\n logits_processor_list = LogitsProcessorList([logits_processor])\n else:\n logits_processor_list = None\n\n generation_config = GenerationConfig(\n max_new_tokens=max_new_tokens,\n stop_strings=stop_at,\n num_return_sequences=(num_samples or 1),\n top_p=top_p,\n top_k=top_k,\n temperature=temperature,\n do_sample=(sampler == \"multinomial\"),\n num_beams=(num_samples if sampler == \"beam_search\" else 1),\n eos_token_id=self.tokenizer.eos_token_id,\n pad_token_id=self.tokenizer.pad_token_id,\n )\n\n return dict(\n logits_processor=logits_processor_list,\n generation_config=generation_config,\n tokenizer=self.tokenizer.tokenizer,\n )\n\n def _generate_output_seq(\n self, prompts, inputs, generation_config, **generation_kwargs\n ):\n input_ids = inputs[\"input_ids\"]\n output_ids = self.model.generate(\n **inputs, generation_config=generation_config, **generation_kwargs\n )\n\n # encoder-decoder returns output_ids only, decoder-only returns full seq ids\n if self.model.config.is_encoder_decoder:\n generated_ids = output_ids\n else:\n generated_ids = output_ids[:, input_ids.shape[1] :]\n\n # if batch list inputs AND multiple samples per input, convert generated_id to 3D view\n num_samples = generation_config.num_return_sequences or 1\n\n if num_samples > 1 and isinstance(prompts, list):\n batch_size = input_ids.size(0)\n num_return_sequences = generation_config.num_return_sequences or 1\n generated_ids = generated_ids.view(batch_size, num_return_sequences, -1)\n\n return generated_ids\n\n def _decode_generation(self, generated_ids: \"torch.Tensor\"):\n if len(generated_ids.shape) == 1:\n return self.tokenizer.decode([generated_ids])[0]\n elif len(generated_ids.shape) == 2:\n return self.tokenizer.decode(generated_ids)\n elif len(generated_ids.shape) == 3:\n return [\n self.tokenizer.decode(generated_ids[i])\n for i in range(len(generated_ids))\n ]\n else:\n raise TypeError(\n f\"Generated outputs aren't 1D, 2D or 3D, but instead are {generated_ids.shape}\"\n )\n
"},{"location":"api/models/#outlines.models.transformers.Transformers.forward","title":"forward(input_ids, attention_mask, past_key_values=None)
","text":"Compute a forward pass through the transformer model.
"},{"location":"api/models/#outlines.models.transformers.Transformers.forward--parameters","title":"Parameters","text":"input_ids The input token ids. Must be one or two dimensional. attention_mask The attention mask. Must be one or two dimensional. past_key_values A tuple of tuples containing the cached key and value tensors for each attention head.
"},{"location":"api/models/#outlines.models.transformers.Transformers.forward--returns","title":"Returns","text":"The computed logits and the new cached key and value tensors.
Source code in outlines/models/transformers.py
def forward(\n self,\n input_ids: \"torch.LongTensor\",\n attention_mask: \"torch.LongTensor\",\n past_key_values: Optional[Tuple] = None,\n) -> Tuple[\"torch.FloatTensor\", Optional[KVCacheType]]:\n \"\"\"Compute a forward pass through the transformer model.\n\n Parameters\n ----------\n input_ids\n The input token ids. Must be one or two dimensional.\n attention_mask\n The attention mask. Must be one or two dimensional.\n past_key_values\n A tuple of tuples containing the cached key and value tensors for each\n attention head.\n\n Returns\n -------\n The computed logits and the new cached key and value tensors.\n\n \"\"\"\n try:\n import torch\n except ImportError:\n ImportError(\n \"The `torch` library needs to be installed to use `transformers` models.\"\n )\n assert 0 < input_ids.ndim < 3\n\n if past_key_values:\n input_ids = input_ids[..., -1].unsqueeze(-1)\n\n with torch.inference_mode():\n output = self.model(\n input_ids,\n attention_mask=attention_mask,\n return_dict=True,\n output_attentions=False,\n output_hidden_states=False,\n past_key_values=past_key_values,\n )\n\n return output.logits, output.past_key_values\n
"},{"location":"api/models/#outlines.models.transformers.Transformers.generate","title":"generate(prompts, generation_parameters, logits_processor, sampling_parameters)
","text":"Generate text using transformers
.
"},{"location":"api/models/#outlines.models.transformers.Transformers.generate--arguments","title":"Arguments","text":"prompts A prompt or list of prompts. generation_parameters An instance of GenerationParameters
that contains the prompt, the maximum number of tokens, stop sequences and seed. All the arguments to SequenceGeneratorAdapter
's __cal__
method. logits_processor The logits processor to use when generating text. sampling_parameters An instance of SamplingParameters
, a dataclass that contains the name of the sampler to use and related parameters as available in Outlines.
"},{"location":"api/models/#outlines.models.transformers.Transformers.generate--returns","title":"Returns","text":"The generated text
Source code in outlines/models/transformers.py
def generate(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n) -> Union[str, List[str], List[List[str]]]:\n \"\"\"Generate text using `transformers`.\n\n Arguments\n ---------\n prompts\n A prompt or list of prompts.\n generation_parameters\n An instance of `GenerationParameters` that contains the prompt,\n the maximum number of tokens, stop sequences and seed. All the\n arguments to `SequenceGeneratorAdapter`'s `__cal__` method.\n logits_processor\n The logits processor to use when generating text.\n sampling_parameters\n An instance of `SamplingParameters`, a dataclass that contains\n the name of the sampler to use and related parameters as available\n in Outlines.\n\n Returns\n -------\n The generated text\n \"\"\"\n if isinstance(prompts, str):\n # convert to 2d\n input_ids, attention_mask = self.tokenizer.encode([prompts])\n else:\n input_ids, attention_mask = self.tokenizer.encode(prompts)\n\n inputs = {\n \"input_ids\": input_ids.to(self.model.device),\n \"attention_mask\": attention_mask.to(self.model.device),\n }\n if (\n \"attention_mask\"\n not in inspect.signature(self.model.forward).parameters.keys()\n ):\n del inputs[\"attention_mask\"]\n\n generation_kwargs = self._get_generation_kwargs(\n prompts,\n generation_parameters,\n logits_processor,\n sampling_parameters,\n )\n generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)\n\n # if single str input and single sample per input, convert to a 1D output\n if isinstance(prompts, str):\n generated_ids = generated_ids.squeeze(0)\n\n return self._decode_generation(generated_ids)\n
"},{"location":"api/models/#outlines.models.transformers.Transformers.stream","title":"stream(prompts, generation_parameters, logits_processor, sampling_parameters)
","text":"Temporary stream stand-in which implements stream() signature and equivalent behaviour but isn't yielded until generation completes.
TODO: implement following completion of https://github.com/huggingface/transformers/issues/30810
Source code in outlines/models/transformers.py
def stream(\n self,\n prompts: Union[str, List[str]],\n generation_parameters: GenerationParameters,\n logits_processor: Optional[\"OutlinesLogitsProcessor\"],\n sampling_parameters: SamplingParameters,\n) -> Iterator[Union[str, List[str]]]:\n \"\"\"\n Temporary stream stand-in which implements stream() signature\n and equivalent behaviour but isn't yielded until generation completes.\n\n TODO: implement following completion of https://github.com/huggingface/transformers/issues/30810\n \"\"\"\n if isinstance(prompts, str):\n # convert to 2d\n input_ids, attention_mask = self.tokenizer.encode([prompts])\n else:\n input_ids, attention_mask = self.tokenizer.encode(prompts)\n inputs = {\n \"input_ids\": input_ids.to(self.model.device),\n \"attention_mask\": attention_mask.to(self.model.device),\n }\n if (\n \"attention_mask\"\n not in inspect.signature(self.model.forward).parameters.keys()\n ):\n del inputs[\"attention_mask\"]\n\n generation_kwargs = self._get_generation_kwargs(\n prompts,\n generation_parameters,\n logits_processor,\n sampling_parameters,\n )\n generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)\n\n # if single str input and single sample per input, convert to a 1D output\n if isinstance(prompts, str):\n generated_ids = generated_ids.squeeze(0)\n\n for i in range(generated_ids.size(-1)):\n output_group_ids = generated_ids.select(-1, i).unsqueeze(-1)\n yield self._decode_generation(output_group_ids)\n
"},{"location":"api/models/#outlines.models.transformers.get_llama_tokenizer_types","title":"get_llama_tokenizer_types()
","text":"Get all the Llama tokenizer types/classes that need work-arounds.
When they can't be imported, a dummy class is created.
Source code in outlines/models/transformers.py
def get_llama_tokenizer_types():\n \"\"\"Get all the Llama tokenizer types/classes that need work-arounds.\n\n When they can't be imported, a dummy class is created.\n\n \"\"\"\n try:\n from transformers.models.llama import LlamaTokenizer\n except ImportError:\n\n class LlamaTokenizer: # type: ignore\n pass\n\n try:\n from transformers.models.llama import LlamaTokenizerFast\n except ImportError:\n\n class LlamaTokenizerFast: # type: ignore\n pass\n\n try:\n from transformers.models.code_llama import CodeLlamaTokenizer\n except ImportError:\n\n class CodeLlamaTokenizer: # type: ignore\n pass\n\n try:\n from transformers.models.code_llama import CodeLlamaTokenizerFast\n except ImportError:\n\n class CodeLlamaTokenizerFast: # type: ignore\n pass\n\n return (\n LlamaTokenizer,\n LlamaTokenizerFast,\n CodeLlamaTokenizer,\n CodeLlamaTokenizerFast,\n )\n
"},{"location":"api/models/#outlines.models.transformers.transformers","title":"transformers(model_name, device=None, model_kwargs={}, tokenizer_kwargs={}, model_class=None, tokenizer_class=None)
","text":"Instantiate a model from the transformers
library and its tokenizer.
"},{"location":"api/models/#outlines.models.transformers.transformers--parameters","title":"Parameters","text":"model_name The name of the model as listed on Hugging Face's model page. device The device(s) on which the model should be loaded. This overrides the device_map
entry in model_kwargs
when provided. model_kwargs A dictionary that contains the keyword arguments to pass to the from_pretrained
method when loading the model. tokenizer_kwargs A dictionary that contains the keyword arguments to pass to the from_pretrained
method when loading the tokenizer.
"},{"location":"api/models/#outlines.models.transformers.transformers--returns","title":"Returns","text":"A TransformersModel
model instance.
Source code in outlines/models/transformers.py
def transformers(\n model_name: str,\n device: Optional[str] = None,\n model_kwargs: dict = {},\n tokenizer_kwargs: dict = {},\n model_class=None,\n tokenizer_class=None,\n):\n \"\"\"Instantiate a model from the `transformers` library and its tokenizer.\n\n Parameters\n ----------\n model_name\n The name of the model as listed on Hugging Face's model page.\n device\n The device(s) on which the model should be loaded. This overrides\n the `device_map` entry in `model_kwargs` when provided.\n model_kwargs\n A dictionary that contains the keyword arguments to pass to the\n `from_pretrained` method when loading the model.\n tokenizer_kwargs\n A dictionary that contains the keyword arguments to pass to the\n `from_pretrained` method when loading the tokenizer.\n\n Returns\n -------\n A `TransformersModel` model instance.\n\n \"\"\"\n if model_class is None or tokenizer_class is None:\n try:\n from transformers import AutoModelForCausalLM, AutoTokenizer\n except ImportError:\n raise ImportError(\n \"The `transformers` library needs to be installed in order to use `transformers` models.\"\n )\n if model_class is None:\n model_class = AutoModelForCausalLM\n if tokenizer_class is None:\n tokenizer_class = AutoTokenizer\n\n if device is not None:\n model_kwargs[\"device_map\"] = device\n\n model = model_class.from_pretrained(model_name, **model_kwargs)\n\n tokenizer_kwargs.setdefault(\"padding_side\", \"left\")\n tokenizer = tokenizer_class.from_pretrained(model_name, **tokenizer_kwargs)\n\n return Transformers(model, tokenizer)\n
"},{"location":"api/models/#outlines.models.openai.OpenAI","title":"OpenAI
","text":"An object that represents the OpenAI API.
Source code in outlines/models/openai.py
class OpenAI:\n \"\"\"An object that represents the OpenAI API.\"\"\"\n\n def __init__(\n self,\n client,\n config,\n system_prompt: Optional[str] = None,\n ):\n \"\"\"Create an `OpenAI` instance.\n\n This class supports the standard OpenAI API, the Azure OpeanAI API as\n well as compatible APIs that rely on the OpenAI client.\n\n Parameters\n ----------\n client\n An instance of the API's async client.\n config\n An instance of `OpenAIConfig`. Can be useful to specify some\n parameters that cannot be set by calling this class' methods.\n \"\"\"\n\n self.client = client\n self.config = config\n\n # We count the total number of prompt and generated tokens as returned\n # by the OpenAI API, summed over all the requests performed with this\n # model instance.\n self.prompt_tokens = 0\n self.completion_tokens = 0\n\n self.format_sequence = lambda seq: seq\n\n def __call__(\n self,\n prompt: Union[str, List[str]],\n max_tokens: Optional[int] = None,\n stop_at: Optional[Union[List[str], str]] = None,\n *,\n system_prompt: Optional[str] = None,\n temperature: Optional[float] = None,\n samples: Optional[int] = None,\n ) -> np.ndarray:\n \"\"\"Call the OpenAI API to generate text.\n\n Parameters\n ----------\n prompt\n A string or list of strings that will be used to prompt the model\n max_tokens\n The maximum number of tokens to generate\n stop_at\n A string or array of strings which, such that the generation stops\n when they are generated.\n system_prompt\n The content of the system message that precedes the user's prompt.\n temperature\n The value of the temperature used to sample tokens\n samples\n The number of completions to generate for each prompt\n stop_at\n Up to 4 words where the API will stop the completion.\n\n \"\"\"\n if max_tokens is None:\n max_tokens = self.config.max_tokens\n if stop_at is None:\n stop_at = self.config.stop\n if temperature is None:\n temperature = self.config.temperature\n if samples is None:\n samples = self.config.n\n\n config = replace(self.config, max_tokens=max_tokens, temperature=temperature, n=samples, stop=stop_at) # type: ignore\n\n response, prompt_tokens, completion_tokens = generate_chat(\n prompt, system_prompt, self.client, config\n )\n self.prompt_tokens += prompt_tokens\n self.completion_tokens += completion_tokens\n\n return self.format_sequence(response)\n\n def stream(self, *args, **kwargs):\n raise NotImplementedError(\n \"Streaming is currently not supported for the OpenAI API\"\n )\n\n def new_with_replacements(self, **kwargs):\n new_instance = copy.copy(self)\n new_instance.config = replace(new_instance.config, **kwargs)\n return new_instance\n\n def __str__(self):\n return self.__class__.__name__ + \" API\"\n\n def __repr__(self):\n return str(self.config)\n
"},{"location":"api/models/#outlines.models.openai.OpenAI.__call__","title":"__call__(prompt, max_tokens=None, stop_at=None, *, system_prompt=None, temperature=None, samples=None)
","text":"Call the OpenAI API to generate text.
"},{"location":"api/models/#outlines.models.openai.OpenAI.__call__--parameters","title":"Parameters","text":"prompt A string or list of strings that will be used to prompt the model max_tokens The maximum number of tokens to generate stop_at A string or array of strings which, such that the generation stops when they are generated. system_prompt The content of the system message that precedes the user's prompt. temperature The value of the temperature used to sample tokens samples The number of completions to generate for each prompt stop_at Up to 4 words where the API will stop the completion.
Source code in outlines/models/openai.py
def __call__(\n self,\n prompt: Union[str, List[str]],\n max_tokens: Optional[int] = None,\n stop_at: Optional[Union[List[str], str]] = None,\n *,\n system_prompt: Optional[str] = None,\n temperature: Optional[float] = None,\n samples: Optional[int] = None,\n) -> np.ndarray:\n \"\"\"Call the OpenAI API to generate text.\n\n Parameters\n ----------\n prompt\n A string or list of strings that will be used to prompt the model\n max_tokens\n The maximum number of tokens to generate\n stop_at\n A string or array of strings which, such that the generation stops\n when they are generated.\n system_prompt\n The content of the system message that precedes the user's prompt.\n temperature\n The value of the temperature used to sample tokens\n samples\n The number of completions to generate for each prompt\n stop_at\n Up to 4 words where the API will stop the completion.\n\n \"\"\"\n if max_tokens is None:\n max_tokens = self.config.max_tokens\n if stop_at is None:\n stop_at = self.config.stop\n if temperature is None:\n temperature = self.config.temperature\n if samples is None:\n samples = self.config.n\n\n config = replace(self.config, max_tokens=max_tokens, temperature=temperature, n=samples, stop=stop_at) # type: ignore\n\n response, prompt_tokens, completion_tokens = generate_chat(\n prompt, system_prompt, self.client, config\n )\n self.prompt_tokens += prompt_tokens\n self.completion_tokens += completion_tokens\n\n return self.format_sequence(response)\n
"},{"location":"api/models/#outlines.models.openai.OpenAI.__init__","title":"__init__(client, config, system_prompt=None)
","text":"Create an OpenAI
instance.
This class supports the standard OpenAI API, the Azure OpeanAI API as well as compatible APIs that rely on the OpenAI client.
"},{"location":"api/models/#outlines.models.openai.OpenAI.__init__--parameters","title":"Parameters","text":"client An instance of the API's async client. config An instance of OpenAIConfig
. Can be useful to specify some parameters that cannot be set by calling this class' methods.
Source code in outlines/models/openai.py
def __init__(\n self,\n client,\n config,\n system_prompt: Optional[str] = None,\n):\n \"\"\"Create an `OpenAI` instance.\n\n This class supports the standard OpenAI API, the Azure OpeanAI API as\n well as compatible APIs that rely on the OpenAI client.\n\n Parameters\n ----------\n client\n An instance of the API's async client.\n config\n An instance of `OpenAIConfig`. Can be useful to specify some\n parameters that cannot be set by calling this class' methods.\n \"\"\"\n\n self.client = client\n self.config = config\n\n # We count the total number of prompt and generated tokens as returned\n # by the OpenAI API, summed over all the requests performed with this\n # model instance.\n self.prompt_tokens = 0\n self.completion_tokens = 0\n\n self.format_sequence = lambda seq: seq\n
"},{"location":"api/models/#outlines.models.openai.OpenAIConfig","title":"OpenAIConfig
dataclass
","text":"Represents the parameters of the OpenAI API.
The information was last fetched on 2023/11/20. We document below the properties that are specific to the OpenAI API. Not all these properties are supported by Outlines.
"},{"location":"api/models/#outlines.models.openai.OpenAIConfig--properties","title":"Properties","text":"model The name of the model. Available models can be found on OpenAI's website. frequence_penalty Number between 2.0 and -2.0. Positive values penalize new tokens based on their existing frequency in the text, logit_bias Modifies the likelihood of specified tokens to appear in the completion. Number between -100 (forbid) and +100 (only allows). n The number of completions to return for each prompt. presence_penalty Similar to frequency penalty. response_format Specifies the format the model must output. {\"type\": \"json_object\"}
enables JSON mode. seed Two completions with the same seed
value should return the same completion. This is however not guaranteed. stop Up to 4 words where the API will stop the completion. temperature Number between 0 and 2. Higher values make the output more random, while lower values make it more deterministic. top_p Number between 0 and 1. Parameter for nucleus sampling. user A unique identifier for the end-user.
Source code in outlines/models/openai.py
@dataclass(frozen=True)\nclass OpenAIConfig:\n \"\"\"Represents the parameters of the OpenAI API.\n\n The information was last fetched on 2023/11/20. We document below the\n properties that are specific to the OpenAI API. Not all these properties are\n supported by Outlines.\n\n Properties\n ----------\n model\n The name of the model. Available models can be found on OpenAI's website.\n frequence_penalty\n Number between 2.0 and -2.0. Positive values penalize new tokens based on\n their existing frequency in the text,\n logit_bias\n Modifies the likelihood of specified tokens to appear in the completion.\n Number between -100 (forbid) and +100 (only allows).\n n\n The number of completions to return for each prompt.\n presence_penalty\n Similar to frequency penalty.\n response_format\n Specifies the format the model must output. `{\"type\": \"json_object\"}`\n enables JSON mode.\n seed\n Two completions with the same `seed` value should return the same\n completion. This is however not guaranteed.\n stop\n Up to 4 words where the API will stop the completion.\n temperature\n Number between 0 and 2. Higher values make the output more random, while\n lower values make it more deterministic.\n top_p\n Number between 0 and 1. Parameter for nucleus sampling.\n user\n A unique identifier for the end-user.\n\n \"\"\"\n\n model: str = \"\"\n frequency_penalty: float = 0\n logit_bias: Dict[int, int] = field(default_factory=dict)\n max_tokens: Optional[int] = None\n n: int = 1\n presence_penalty: float = 0\n response_format: Optional[Dict[str, str]] = None\n seed: Optional[int] = None\n stop: Optional[Union[str, List[str]]] = None\n temperature: float = 1.0\n top_p: int = 1\n user: str = field(default_factory=str)\n
"},{"location":"api/models/#outlines.models.openai.error_handler","title":"error_handler(api_call_fn)
","text":"Handle OpenAI API errors and missing API key.
Source code in outlines/models/openai.py
def error_handler(api_call_fn: Callable) -> Callable:\n \"\"\"Handle OpenAI API errors and missing API key.\"\"\"\n\n def call(*args, **kwargs):\n import openai\n\n try:\n return api_call_fn(*args, **kwargs)\n except (\n openai.APITimeoutError,\n openai.InternalServerError,\n openai.RateLimitError,\n ) as e:\n raise OSError(f\"Could not connect to the OpenAI API: {e}\")\n except (\n openai.AuthenticationError,\n openai.BadRequestError,\n openai.ConflictError,\n openai.PermissionDeniedError,\n openai.NotFoundError,\n openai.UnprocessableEntityError,\n ) as e:\n raise e\n\n return call\n
"},{"location":"api/models/#outlines.models.openai.generate_chat","title":"generate_chat(prompt, system_prompt, client, config)
async
","text":"Call OpenAI's Chat Completion API.
"},{"location":"api/models/#outlines.models.openai.generate_chat--parameters","title":"Parameters","text":"prompt The prompt we use to start the generation. Passed to the model with the \"user\" role. system_prompt The system prompt, passed to the model with the \"system\" role before the prompt. client The API client config An OpenAIConfig
instance.
"},{"location":"api/models/#outlines.models.openai.generate_chat--returns","title":"Returns","text":"A tuple that contains the model's response(s) and usage statistics.
Source code in outlines/models/openai.py
@functools.partial(vectorize, signature=\"(),(),(),()->(s),(),()\")\nasync def generate_chat(\n prompt: str,\n system_prompt: Union[str, None],\n client,\n config: OpenAIConfig,\n) -> Tuple[np.ndarray, int, int]:\n \"\"\"Call OpenAI's Chat Completion API.\n\n Parameters\n ----------\n prompt\n The prompt we use to start the generation. Passed to the model\n with the \"user\" role.\n system_prompt\n The system prompt, passed to the model with the \"system\" role\n before the prompt.\n client\n The API client\n config\n An `OpenAIConfig` instance.\n\n Returns\n -------\n A tuple that contains the model's response(s) and usage statistics.\n\n \"\"\"\n\n @error_handler\n @cache()\n async def call_api(prompt, system_prompt, config):\n responses = await client.chat.completions.create(\n messages=system_message + user_message,\n **asdict(config), # type: ignore\n )\n return responses.model_dump()\n\n system_message = (\n [{\"role\": \"system\", \"content\": system_prompt}] if system_prompt else []\n )\n user_message = [{\"role\": \"user\", \"content\": prompt}]\n\n responses = await call_api(prompt, system_prompt, config)\n\n results = np.array(\n [responses[\"choices\"][i][\"message\"][\"content\"] for i in range(config.n)]\n )\n usage = responses[\"usage\"]\n\n return results, usage[\"prompt_tokens\"], usage[\"completion_tokens\"]\n
"},{"location":"api/parsing/","title":"Parsing","text":""},{"location":"api/parsing/#outlines.fsm.parsing.PartialIndenter","title":"PartialIndenter
","text":" Bases: Indenter
An Indenter
that doesn't reset its state every time process
is called.
Source code in outlines/fsm/parsing.py
class PartialIndenter(Indenter):\n \"\"\"An `Indenter` that doesn't reset its state every time `process` is called.\"\"\"\n\n def process(self, stream):\n return self._process(stream)\n\n def _process(self, stream):\n for token in stream:\n # These were previously *after* the `yield`, but that makes the\n # state tracking unnecessarily convoluted.\n if token.type in self.OPEN_PAREN_types:\n self.paren_level += 1\n elif token.type in self.CLOSE_PAREN_types:\n self.paren_level -= 1\n if self.paren_level < 0:\n raise UnexpectedToken(token, [])\n\n if token.type == self.NL_type:\n yield from self.handle_NL(token)\n else:\n yield token\n\n # TODO: What do we want to do here?\n # while len(self.indent_level) > 1:\n # self.indent_level.pop()\n # yield Token(self.DEDENT_type, \"\")\n\n def accepts_token_type(self, token_type):\n if token_type in self.CLOSE_PAREN_types and self.paren_level - 1 < 0:\n return False\n\n # TODO:\n # if token_type == self.NL_type and self.paren_level == 0:\n # ...\n # return False\n\n return True\n\n def __copy__(self):\n res = type(self)()\n res.paren_level = self.paren_level\n res.indent_level = copy(self.indent_level)\n return res\n\n def __repr__(self):\n return f\"{type(self).__name__}(paren_level={self.paren_level!r}, indent_level={self.indent_level!r})\"\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialParserState","title":"PartialParserState
","text":" Bases: ParserState
Source code in outlines/fsm/parsing.py
class PartialParserState(ParserState):\n __slots__ = \"use_value_stack\"\n\n def __init__(\n self,\n parse_conf,\n lexer,\n state_stack=None,\n value_stack=None,\n use_value_stack=False,\n ):\n super().__init__(\n parse_conf, lexer, state_stack=state_stack, value_stack=value_stack\n )\n self.use_value_stack = use_value_stack\n\n def feed_token(self, token, is_end=False):\n if token.type == \"partial\":\n # If none of the potential terminals can transition, we need to know now\n current_state = self.state_stack[-1]\n current_lexer = get_contextual_lexer(self.lexer).lexers[current_state]\n\n # We have to feed the token and determine whether or not at least\n # one terminal is consistent with the stack; otherwise, we'll miss\n # invalid REDUCE cases.\n # TODO: We should track separate parses conditional on possible\n # token/symbol types, then we can coherently reuse the following\n # results instead of recomputing it later.\n can_transition = False\n for terminal_info in token.value.terminals_and_info:\n if terminal_info.terminal_name not in current_lexer.ignore_types:\n test_token = Token.new_borrow_pos(\n terminal_info.terminal_name, \"\", token\n )\n\n stack = copy(self.state_stack)\n try:\n self.feed_token_no_stack(test_token, is_end=is_end)\n can_transition = True\n break\n except UnexpectedToken:\n continue\n finally:\n self.state_stack = stack\n else:\n can_transition = True\n\n if not can_transition:\n expected = {\n s\n for s in self.parse_conf.states[current_state].keys()\n if s.isupper()\n }\n raise UnexpectedToken(\n token, expected, state=self, interactive_parser=None\n )\n\n elif self.use_value_stack:\n super().feed_token(token, is_end=is_end)\n else:\n self.feed_token_no_stack(token, is_end=is_end)\n\n def feed_token_no_stack(self, token, is_end=False):\n \"\"\"\n This is a copy of `ParserState.feed_token` with all the value stack\n steps removed. Since we're not exactly parsing in order to obtain a\n CST or anything similar, we can avoid the growing expense of tracking\n the parse tree.\n \"\"\"\n state_stack = self.state_stack\n states = self.parse_conf.states\n end_state = self.parse_conf.end_state\n\n while True:\n state = state_stack[-1]\n try:\n action, arg = states[state][token.type]\n except KeyError:\n expected = {s for s in states[state].keys() if s.isupper()}\n raise UnexpectedToken(\n token, expected, state=self, interactive_parser=None\n )\n\n assert arg != end_state\n\n if action is Shift:\n # shift once and return\n assert not is_end\n state_stack.append(arg)\n return\n else:\n # reduce+shift as many times as necessary\n rule = arg\n size = len(rule.expansion)\n if size:\n del state_stack[-size:]\n\n _action, new_state = states[state_stack[-1]][rule.origin.name]\n assert _action is Shift\n state_stack.append(new_state)\n\n if is_end and state_stack[-1] == end_state:\n return\n\n def feed_eof(self):\n last_token = self.lexer.state.last_token\n\n if last_token is None:\n eof_token = self.lexer._Token(\"$END\", \"\", 0, 1, 1)\n else:\n eof_token = Token.new_borrow_pos(\"$END\", \"\", last_token)\n\n new_token_is_legal = (\n last_token is None\n or last_token.type != \"partial\"\n or any(ti.is_final for ti in last_token.value.terminals_and_info)\n )\n if new_token_is_legal:\n self.feed_token(eof_token, is_end=True)\n else:\n raise UnexpectedToken(eof_token, [], state=self, interactive_parser=None)\n\n def choices(self):\n return self.parse_conf.parse_table.states[self.position]\n\n def accepts(self):\n \"\"\"\n Adapted from https://github.com/lark-parser/lark/blob/be542c2ff6d968817df019b8bf03f37b3111c08c/lark/parsers/lalr_interactive_parser.py#L95\n Returns the set of possible tokens that will advance the parser into a new valid state.\n \"\"\"\n accepts = set()\n conf_no_callbacks = copy(self.parse_conf)\n # We don't want to call callbacks here since those might have arbitrary side effects\n # and are unnecessarily slow.\n conf_no_callbacks.callbacks = {}\n for t in self.choices():\n if t.isupper(): # is terminal?\n new_state = copy(self)\n new_state.parse_conf = conf_no_callbacks\n try:\n new_state.feed_token(new_state.lexer._Token(t, \"\"))\n except UnexpectedToken:\n pass\n else:\n accepts.add(t)\n return accepts\n\n def __copy__(self):\n return type(self)(\n self.parse_conf,\n copy(self.lexer),\n copy(self.state_stack),\n deepcopy(self.value_stack),\n use_value_stack=self.use_value_stack,\n )\n\n def __repr__(self):\n return f\"{type(self).__name__}(lexer={self.lexer!r}, state_stack={self.state_stack!r})\"\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialParserState.accepts","title":"accepts()
","text":"Adapted from https://github.com/lark-parser/lark/blob/be542c2ff6d968817df019b8bf03f37b3111c08c/lark/parsers/lalr_interactive_parser.py#L95 Returns the set of possible tokens that will advance the parser into a new valid state.
Source code in outlines/fsm/parsing.py
def accepts(self):\n \"\"\"\n Adapted from https://github.com/lark-parser/lark/blob/be542c2ff6d968817df019b8bf03f37b3111c08c/lark/parsers/lalr_interactive_parser.py#L95\n Returns the set of possible tokens that will advance the parser into a new valid state.\n \"\"\"\n accepts = set()\n conf_no_callbacks = copy(self.parse_conf)\n # We don't want to call callbacks here since those might have arbitrary side effects\n # and are unnecessarily slow.\n conf_no_callbacks.callbacks = {}\n for t in self.choices():\n if t.isupper(): # is terminal?\n new_state = copy(self)\n new_state.parse_conf = conf_no_callbacks\n try:\n new_state.feed_token(new_state.lexer._Token(t, \"\"))\n except UnexpectedToken:\n pass\n else:\n accepts.add(t)\n return accepts\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialParserState.feed_token_no_stack","title":"feed_token_no_stack(token, is_end=False)
","text":"This is a copy of ParserState.feed_token
with all the value stack steps removed. Since we're not exactly parsing in order to obtain a CST or anything similar, we can avoid the growing expense of tracking the parse tree.
Source code in outlines/fsm/parsing.py
def feed_token_no_stack(self, token, is_end=False):\n \"\"\"\n This is a copy of `ParserState.feed_token` with all the value stack\n steps removed. Since we're not exactly parsing in order to obtain a\n CST or anything similar, we can avoid the growing expense of tracking\n the parse tree.\n \"\"\"\n state_stack = self.state_stack\n states = self.parse_conf.states\n end_state = self.parse_conf.end_state\n\n while True:\n state = state_stack[-1]\n try:\n action, arg = states[state][token.type]\n except KeyError:\n expected = {s for s in states[state].keys() if s.isupper()}\n raise UnexpectedToken(\n token, expected, state=self, interactive_parser=None\n )\n\n assert arg != end_state\n\n if action is Shift:\n # shift once and return\n assert not is_end\n state_stack.append(arg)\n return\n else:\n # reduce+shift as many times as necessary\n rule = arg\n size = len(rule.expansion)\n if size:\n del state_stack[-size:]\n\n _action, new_state = states[state_stack[-1]][rule.origin.name]\n assert _action is Shift\n state_stack.append(new_state)\n\n if is_end and state_stack[-1] == end_state:\n return\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialParsingFrontend","title":"PartialParsingFrontend
","text":" Bases: ParsingFrontend
Source code in outlines/fsm/parsing.py
class PartialParsingFrontend(ParsingFrontend):\n def __init__(self, lexer_conf, parser_conf, options, parser=None):\n assert parser_conf.parser_type == \"lalr\"\n\n options._plugins[\"LALR_Parser\"] = PartialLALRParser\n options._plugins[\"BasicLexer\"] = PartialBasicLexer\n options._plugins[\"ContextualLexer\"] = PartialContextualLexer\n options._plugins[\"LexerThread\"] = PartialLexerThread\n\n super().__init__(lexer_conf, parser_conf, options, parser=parser)\n\n if lexer_conf.postlex:\n self.lexer = PartialPostLexConnector(self.lexer.lexer, lexer_conf.postlex)\n\n self._termset_fsm_info = None\n self._symbols_to_states: Optional[\n Dict[str, Set[Tuple[ParseStateType, Action]]]\n ] = None\n self._reverse_shifts: Optional[\n Dict[ParseStateType, Dict[str, Set[ParseStateType]]]\n ] = None\n # self._state_transition_map: Optional[\n # Dict[Tuple[ParseStateType, str], Set[ParseStateType]]\n # ] = None\n\n def _compute_maps(\n self,\n ):\n \"\"\"Compute state transition and symbols-to-states maps.\"\"\"\n self._reverse_shifts = {}\n self._symbols_to_states = {}\n\n parse_table = self.parser.parser.parse_table\n\n for from_state, symbols_to_ops in parse_table.states.items():\n for symbol, op in symbols_to_ops.items():\n if op[0] == Shift:\n symbols_to_from_states = self._reverse_shifts.setdefault(op[1], {})\n symbols_to_from_states.setdefault(symbol, set()).add(from_state)\n self._symbols_to_states.setdefault(symbol, set()).add((from_state, op))\n\n # # TODO: This approach is very wasteful.\n # context_lexer = get_contextual_lexer(self)\n # self._state_transition_map = {}\n #\n # for from_state, transitions in parse_table.states.items():\n # for symbol, action in transitions.items():\n # # TODO: Filter non-terminals\n # if symbol not in context_lexer.root_lexer.terminals_by_name:\n # continue\n #\n # if action[0] is Shift:\n # self._state_transition_map.setdefault(\n # (from_state, symbol), set()\n # ).add(action[1])\n # continue\n #\n # antecedent_state_seqs = parse_to_terminal(self, [(from_state,)], symbol)\n #\n # for antecedent_state_seq in antecedent_state_seqs:\n # antecedent_state = antecedent_state_seq[-1]\n # self._state_transition_map.setdefault(\n # (from_state, symbol), set()\n # ).add(antecedent_state)\n\n def _compute_termset_fsm_info(self):\n \"\"\"Collect and return information about terminal symbol sets and their FSMs.\n\n Terminal symbol sets (or \"termsets\") are ordered sequences of terminal\n symbols that are used by each parser state. Associated with each is a\n collection of FSMs for each terminal and a single parse state FSM that is\n the union of each terminal's FSM.\n\n This constructs a list of tuples containing the termset, the set of\n parse states that use the termsets, parse state FSMs, and information\n mapping the components of the parse state FSMs to their terminal symbol\n FSMs.\n\n \"\"\"\n context_lexer = get_contextual_lexer(self)\n termsets_to_fsms = {}\n termsets_to_parse_states: Dict[Tuple[str, ...], Set[ParseStateType]] = {}\n for parse_state, lexer in context_lexer.lexers.items():\n scanner = lexer.scanner\n key = tuple(term.name for term in scanner.terminals)\n termsets_to_fsms[key] = (scanner.fsm, scanner.fsms_to_trans_finals)\n termsets_to_parse_states.setdefault(key, set()).add(parse_state)\n\n self._termset_fsm_info = [\n (\n termset,\n frozenset(termsets_to_parse_states[termset]),\n fsm,\n fsms_to_trans_finals,\n )\n for termset, (fsm, fsms_to_trans_finals) in termsets_to_fsms.items()\n ]\n\n @property\n def termset_fsm_info(self):\n if self._termset_fsm_info is None:\n self._compute_termset_fsm_info()\n return self._termset_fsm_info\n\n @property\n def symbols_to_states(self):\n if self._symbols_to_states is None:\n self._compute_maps()\n return self._symbols_to_states\n\n @property\n def reverse_shifts(self):\n if self._reverse_shifts is None:\n self._compute_maps()\n return self._reverse_shifts\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialScanner","title":"PartialScanner
","text":" Bases: Scanner
Source code in outlines/fsm/parsing.py
class PartialScanner(Scanner):\n @classmethod\n @lru_cache\n def construct_terminal_fsm(cls, terminal):\n # TODO: This should really be done at the lexer/parser level so that\n # the lifetime of these objects is tied to the parser itself.\n regex_str = terminal.pattern.to_regexp()\n pattern = interegular.parse_pattern(regex_str)\n fsm, _ = make_deterministic_fsm(pattern.to_fsm().reduce())\n return fsm, pattern.prefix_postfix\n\n def __init__(self, terminals, g_regex_flags, re_, use_bytes, match_whole=False):\n self.terminals = terminals\n self.g_regex_flags = g_regex_flags\n self.use_bytes = use_bytes\n self.match_whole = match_whole\n self.allowed_types = {t.name for t in self.terminals}\n self._mres = None\n\n fsms = []\n for t in self.terminals:\n fsm, prefix_postfix = self.construct_terminal_fsm(t)\n\n # TODO FIXME: We don't support this right now.\n assert prefix_postfix == (0, 0)\n\n fsms.append(fsm)\n\n self.fsm, self.fsms_to_trans_finals = fsm_union(fsms)\n\n def get_terminals_info(\n self, fsm_state_seq\n ) -> Tuple[Tuple[PartialTerminalInfo, ...], Tuple[PartialTerminalInfo, ...]]:\n \"\"\"Get the possible terminal symbols for an FSM state sequence.\"\"\"\n terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()\n final_terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()\n for i, (fsm_id, fsm_reads_more, in_final) in enumerate(\n get_sub_fsms_from_seq(fsm_state_seq, self.fsms_to_trans_finals)\n ):\n terminal_name = self.terminals[fsm_id].name\n info = PartialTerminalInfo(i, terminal_name, fsm_reads_more, in_final)\n terminals_and_info += (info,)\n if in_final:\n final_terminals_and_info += (info,)\n\n return terminals_and_info, final_terminals_and_info\n\n def match(self, text, pos, last_fsm_state_seq: Optional[Tuple[int, ...]] = None):\n \"\"\"Determine an FSM match over `text` starting at `pos` and continuing `last_fsm_state_seq`.\"\"\"\n\n start_pos = pos\n\n if last_fsm_state_seq:\n assert len(last_fsm_state_seq) > 1\n start_pos += len(last_fsm_state_seq) - 1\n start_state = last_fsm_state_seq[-1]\n else:\n start_state = self.fsm.initial\n\n text_part = text[start_pos:]\n\n text_transitions = get_token_transition_keys(\n self.fsm.fsm_info.alphabet_symbol_mapping,\n self.fsm.fsm_info.alphabet_anything_value,\n text_part,\n )\n\n state_seq = walk_fsm(\n self.fsm,\n text_transitions,\n start_state,\n full_match=self.match_whole,\n )\n\n if not state_seq:\n return None\n\n if last_fsm_state_seq:\n res = last_fsm_state_seq + tuple(state_seq)\n else:\n res = (start_state,) + tuple(state_seq)\n\n return res\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialScanner.get_terminals_info","title":"get_terminals_info(fsm_state_seq)
","text":"Get the possible terminal symbols for an FSM state sequence.
Source code in outlines/fsm/parsing.py
def get_terminals_info(\n self, fsm_state_seq\n) -> Tuple[Tuple[PartialTerminalInfo, ...], Tuple[PartialTerminalInfo, ...]]:\n \"\"\"Get the possible terminal symbols for an FSM state sequence.\"\"\"\n terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()\n final_terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()\n for i, (fsm_id, fsm_reads_more, in_final) in enumerate(\n get_sub_fsms_from_seq(fsm_state_seq, self.fsms_to_trans_finals)\n ):\n terminal_name = self.terminals[fsm_id].name\n info = PartialTerminalInfo(i, terminal_name, fsm_reads_more, in_final)\n terminals_and_info += (info,)\n if in_final:\n final_terminals_and_info += (info,)\n\n return terminals_and_info, final_terminals_and_info\n
"},{"location":"api/parsing/#outlines.fsm.parsing.PartialScanner.match","title":"match(text, pos, last_fsm_state_seq=None)
","text":"Determine an FSM match over text
starting at pos
and continuing last_fsm_state_seq
.
Source code in outlines/fsm/parsing.py
def match(self, text, pos, last_fsm_state_seq: Optional[Tuple[int, ...]] = None):\n \"\"\"Determine an FSM match over `text` starting at `pos` and continuing `last_fsm_state_seq`.\"\"\"\n\n start_pos = pos\n\n if last_fsm_state_seq:\n assert len(last_fsm_state_seq) > 1\n start_pos += len(last_fsm_state_seq) - 1\n start_state = last_fsm_state_seq[-1]\n else:\n start_state = self.fsm.initial\n\n text_part = text[start_pos:]\n\n text_transitions = get_token_transition_keys(\n self.fsm.fsm_info.alphabet_symbol_mapping,\n self.fsm.fsm_info.alphabet_anything_value,\n text_part,\n )\n\n state_seq = walk_fsm(\n self.fsm,\n text_transitions,\n start_state,\n full_match=self.match_whole,\n )\n\n if not state_seq:\n return None\n\n if last_fsm_state_seq:\n res = last_fsm_state_seq + tuple(state_seq)\n else:\n res = (start_state,) + tuple(state_seq)\n\n return res\n
"},{"location":"api/parsing/#outlines.fsm.parsing.fsm_union","title":"fsm_union(fsms)
","text":"Construct an FSM representing the union of the FSMs in fsms
.
This is an updated version of interegular.fsm.FSM.union
made to return an extra map of component FSMs to the sets of state transitions that correspond to them in the new FSM.
Source code in outlines/fsm/parsing.py
def fsm_union(\n fsms: Sequence[FSM],\n) -> Tuple[FSM, Dict[int, Tuple[Set[Tuple[int, int]], Set[int], Dict[int, Set[int]]]]]:\n \"\"\"Construct an FSM representing the union of the FSMs in `fsms`.\n\n This is an updated version of `interegular.fsm.FSM.union` made to return an\n extra map of component FSMs to the sets of state transitions that\n correspond to them in the new FSM.\n\n \"\"\"\n\n alphabet, new_to_old = Alphabet.union(*[fsm.alphabet for fsm in fsms])\n\n indexed_fsms = tuple(enumerate(fsms))\n\n initial = {i: fsm.initial for (i, fsm) in indexed_fsms}\n\n # Dedicated function accepting a \"superset\" and returning the next\n # \"superset\" obtained by following this transition in the new FSM\n def follow(current_state, new_transition: int):\n next = {}\n for i, f in indexed_fsms:\n old_transition = new_to_old[i][new_transition]\n if (\n i in current_state\n and current_state[i] in f.map\n and old_transition in f.map[current_state[i]]\n ):\n next[i] = f.map[current_state[i]][old_transition]\n if not next:\n raise OblivionError\n return next\n\n states = [initial]\n finals: Set[int] = set()\n map: Dict[int, Dict[int, int]] = {}\n\n # Map component FSMs to their new state-to-state transitions, finals, and a\n # map translating component FSM states to aggregate FSM states\n fsms_to_trans_finals: Dict[\n int, Tuple[Set[Tuple[int, int]], Set[int], Dict[int, Set[int]]]\n ] = {}\n\n i = 0\n while i < len(states):\n state = states[i]\n\n # Add to the finals of the aggregate FSM whenever we hit a final in a\n # component FSM\n if any(state.get(j, -1) in fsm.finals for (j, fsm) in indexed_fsms):\n finals.add(i)\n\n # Compute the map for this state\n map[i] = {}\n for transition in alphabet.by_transition:\n try:\n next = follow(state, transition)\n except OblivionError:\n # Reached an oblivion state; don't list it\n continue\n else:\n try:\n # TODO: Seems like this could--and should--be avoided\n j = states.index(next)\n except ValueError:\n j = len(states)\n states.append(next)\n\n map[i][transition] = j\n\n for fsm_id, fsm_state in next.items():\n (\n fsm_transitions,\n fsm_finals,\n fsm_old_to_new,\n ) = fsms_to_trans_finals.setdefault(fsm_id, (set(), set(), {}))\n old_from = state[fsm_id]\n old_to = fsm_state\n fsm_old_to_new.setdefault(old_from, set()).add(i)\n fsm_old_to_new.setdefault(old_to, set()).add(j)\n fsm_transitions.add((i, j))\n if fsm_state in fsms[fsm_id].finals:\n fsm_finals.add(j)\n\n i += 1\n\n fsm = FSM(\n alphabet=alphabet,\n states=range(len(states)),\n initial=0,\n finals=finals,\n map=map,\n __no_validation__=True,\n )\n\n fsm, old_to_new_states = make_deterministic_fsm(fsm)\n _fsms_to_trans_finals = {\n fsm_id: (\n {(old_to_new_states[s1], old_to_new_states[s2]) for s1, s2 in transitions},\n {old_to_new_states[s] for s in finals},\n {\n old_state: {old_to_new_states[new_state] for new_state in new_states}\n for old_state, new_states in old_to_new.items()\n },\n )\n for fsm_id, (transitions, finals, old_to_new) in sorted(\n fsms_to_trans_finals.items(), key=lambda x: x[0]\n )\n }\n\n return (\n fsm,\n _fsms_to_trans_finals,\n )\n
"},{"location":"api/parsing/#outlines.fsm.parsing.get_sub_fsms_from_seq","title":"get_sub_fsms_from_seq(state_seq, fsms_to_trans_finals)
","text":"Get the indices of the sub-FSMs in fsm
that could have matched the state sequence state_seq
.
"},{"location":"api/parsing/#outlines.fsm.parsing.get_sub_fsms_from_seq--parameters","title":"Parameters","text":"state_seq A state sequence. fsms_to_trans_finals A map from FSM indices to tuples containing sets of their state transitions and sets of the final/accept states.
"},{"location":"api/parsing/#outlines.fsm.parsing.get_sub_fsms_from_seq--returns","title":"Returns","text":"A generator returning tuples containing each sub-FSM index (in the order they were union-ed to construct fsm
) and booleans indicating whether or not there is another valid transition from the last state in the sequence for the associated sub-FSM (i.e. if the FSM can continue accepting/matching) and whether or not the sequence ends in a final state of the sub-FSM.
Source code in outlines/fsm/parsing.py
def get_sub_fsms_from_seq(\n state_seq: Sequence[int],\n fsms_to_trans_finals: Dict[\n int, Tuple[Set[Tuple[int, int]], Set[int], Dict[int, Set[int]]]\n ],\n) -> Generator[Tuple[int, bool, bool], None, None]:\n \"\"\"Get the indices of the sub-FSMs in `fsm` that could have matched the state sequence `state_seq`.\n\n Parameters\n ----------\n state_seq\n A state sequence.\n fsms_to_trans_finals\n A map from FSM indices to tuples containing sets of their state transitions\n and sets of the final/accept states.\n\n Returns\n -------\n A generator returning tuples containing each sub-FSM index (in the order\n they were union-ed to construct `fsm`) and booleans indicating whether or\n not there is another valid transition from the last state in the sequence\n for the associated sub-FSM (i.e. if the FSM can continue\n accepting/matching) and whether or not the sequence ends in a final state\n of the sub-FSM.\n \"\"\"\n state_seq_transitions = set(zip(state_seq[:-1], state_seq[1:]))\n last_fsm_state = state_seq[-1]\n yield from (\n (\n # The sub-FMS index\n fsm_idx,\n # Is there another possible transition in this sub-FSM?\n any(last_fsm_state == from_s for (from_s, to_s) in transitions),\n # Is this sub-FSM in a final state?\n state_seq[-1] in finals,\n )\n for fsm_idx, (transitions, finals, _) in fsms_to_trans_finals.items()\n if state_seq_transitions.issubset(transitions)\n )\n
"},{"location":"api/parsing/#outlines.fsm.parsing.terminals_to_fsms","title":"terminals_to_fsms(lp)
","text":"Construct a dict
mapping terminal symbol names to their finite state machines.
Source code in outlines/fsm/parsing.py
def terminals_to_fsms(lp: PartialLark) -> Dict[str, FSM]:\n \"\"\"Construct a ``dict`` mapping terminal symbol names to their finite state machines.\"\"\"\n\n symbol_names_and_fsms = {}\n for terminal in lp.terminals:\n pattern = interegular.parse_pattern(terminal.pattern.to_regexp())\n # TODO: Use `pyparser.terminals[0].pattern.flags`?\n try:\n fsm, _ = make_deterministic_fsm(pattern.to_fsm().reduce())\n except Unsupported:\n fsm = None\n\n symbol_names_and_fsms[terminal.name] = fsm\n\n return symbol_names_and_fsms\n
"},{"location":"api/prompts/","title":"Prompts","text":""},{"location":"api/prompts/#outlines.prompts.Prompt","title":"Prompt
dataclass
","text":"Represents a prompt function.
We return a Prompt
class instead of a simple function so the template defined in prompt functions can be accessed.
Source code in outlines/prompts.py
@dataclass\nclass Prompt:\n \"\"\"Represents a prompt function.\n\n We return a `Prompt` class instead of a simple function so the\n template defined in prompt functions can be accessed.\n\n \"\"\"\n\n template: str\n signature: inspect.Signature\n\n def __post_init__(self):\n self.parameters: List[str] = list(self.signature.parameters.keys())\n self.jinja_environment = create_jinja_template(self.template)\n\n def __call__(self, *args, **kwargs) -> str:\n \"\"\"Render and return the template.\n\n Returns\n -------\n The rendered template as a Python ``str``.\n\n \"\"\"\n bound_arguments = self.signature.bind(*args, **kwargs)\n bound_arguments.apply_defaults()\n return self.jinja_environment.render(**bound_arguments.arguments)\n\n def __str__(self):\n return self.template\n
"},{"location":"api/prompts/#outlines.prompts.Prompt.__call__","title":"__call__(*args, **kwargs)
","text":"Render and return the template.
"},{"location":"api/prompts/#outlines.prompts.Prompt.__call__--returns","title":"Returns","text":"The rendered template as a Python str
.
Source code in outlines/prompts.py
def __call__(self, *args, **kwargs) -> str:\n \"\"\"Render and return the template.\n\n Returns\n -------\n The rendered template as a Python ``str``.\n\n \"\"\"\n bound_arguments = self.signature.bind(*args, **kwargs)\n bound_arguments.apply_defaults()\n return self.jinja_environment.render(**bound_arguments.arguments)\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_args","title":"get_fn_args(fn)
","text":"Returns the arguments of a function with annotations and default values if provided.
Source code in outlines/prompts.py
def get_fn_args(fn: Callable):\n \"\"\"Returns the arguments of a function with annotations and default values if provided.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `args` filter only applies to callables.\")\n\n arg_str_list = []\n signature = inspect.signature(fn)\n arg_str_list = [str(param) for param in signature.parameters.values()]\n arg_str = \", \".join(arg_str_list)\n return arg_str\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_description","title":"get_fn_description(fn)
","text":"Returns the first line of a callable's docstring.
Source code in outlines/prompts.py
def get_fn_description(fn: Callable):\n \"\"\"Returns the first line of a callable's docstring.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `description` filter only applies to callables.\")\n\n docstring = inspect.getdoc(fn)\n if docstring is None:\n description = \"\"\n else:\n description = docstring.split(\"\\n\")[0].strip()\n\n return description\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_name","title":"get_fn_name(fn)
","text":"Returns the name of a callable.
Source code in outlines/prompts.py
def get_fn_name(fn: Callable):\n \"\"\"Returns the name of a callable.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `name` filter only applies to callables.\")\n\n if not hasattr(fn, \"__name__\"):\n name = type(fn).__name__\n else:\n name = fn.__name__\n\n return name\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_signature","title":"get_fn_signature(fn)
","text":"Return the signature of a callable.
Source code in outlines/prompts.py
def get_fn_signature(fn: Callable):\n \"\"\"Return the signature of a callable.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `source` filter only applies to callables.\")\n\n source = textwrap.dedent(inspect.getsource(fn))\n re_search = re.search(re.compile(r\"\\(([^)]+)\\)\"), source)\n if re_search is None:\n signature = \"\"\n else:\n signature = re_search.group(1)\n\n return signature\n
"},{"location":"api/prompts/#outlines.prompts.get_fn_source","title":"get_fn_source(fn)
","text":"Return the source code of a callable.
Source code in outlines/prompts.py
def get_fn_source(fn: Callable):\n \"\"\"Return the source code of a callable.\"\"\"\n if not callable(fn):\n raise TypeError(\"The `source` filter only applies to callables.\")\n\n source = textwrap.dedent(inspect.getsource(fn))\n re_search = re.search(re.compile(r\"(\\bdef\\b.*)\", re.DOTALL), source)\n if re_search is not None:\n source = re_search.group(0)\n else:\n raise TypeError(\"Could not read the function's source code\")\n\n return source\n
"},{"location":"api/prompts/#outlines.prompts.get_schema_dict","title":"get_schema_dict(model)
","text":"Return a pretty-printed dictionary
Source code in outlines/prompts.py
@get_schema.register(dict)\ndef get_schema_dict(model: Dict):\n \"\"\"Return a pretty-printed dictionary\"\"\"\n return json.dumps(model, indent=2)\n
"},{"location":"api/prompts/#outlines.prompts.get_schema_pydantic","title":"get_schema_pydantic(model)
","text":"Return the schema of a Pydantic model.
Source code in outlines/prompts.py
@get_schema.register(type(BaseModel))\ndef get_schema_pydantic(model: Type[BaseModel]):\n \"\"\"Return the schema of a Pydantic model.\"\"\"\n if not type(model) == type(BaseModel):\n raise TypeError(\"The `schema` filter only applies to Pydantic models.\")\n\n if hasattr(model, \"model_json_schema\"):\n def_key = \"$defs\"\n raw_schema = model.model_json_schema()\n else: # pragma: no cover\n def_key = \"definitions\"\n raw_schema = model.schema()\n\n definitions = raw_schema.get(def_key, None)\n schema = parse_pydantic_schema(raw_schema, definitions)\n\n return json.dumps(schema, indent=2)\n
"},{"location":"api/prompts/#outlines.prompts.parse_pydantic_schema","title":"parse_pydantic_schema(raw_schema, definitions)
","text":"Parse the output of Basemodel.[schema|model_json_schema]()
.
This recursively follows the references to other schemas in case of nested models. Other schemas are stored under the \"definitions\" key in the schema of the top-level model.
Source code in outlines/prompts.py
def parse_pydantic_schema(raw_schema, definitions):\n \"\"\"Parse the output of `Basemodel.[schema|model_json_schema]()`.\n\n This recursively follows the references to other schemas in case\n of nested models. Other schemas are stored under the \"definitions\"\n key in the schema of the top-level model.\n\n \"\"\"\n simple_schema = {}\n for name, value in raw_schema[\"properties\"].items():\n if \"description\" in value:\n simple_schema[name] = value[\"description\"]\n elif \"$ref\" in value:\n refs = value[\"$ref\"].split(\"/\")\n simple_schema[name] = parse_pydantic_schema(\n definitions[refs[2]], definitions\n )\n else:\n simple_schema[name] = f\"<{name}>\"\n\n return simple_schema\n
"},{"location":"api/prompts/#outlines.prompts.prompt","title":"prompt(fn)
","text":"Decorate a function that contains a prompt template.
This allows to define prompts in the docstring of a function and simplify their manipulation by providing some degree of encapsulation. It uses the render
function internally to render templates.
import outlines
@outlines.prompt def build_prompt(question): ... \"I have a ${question}\" ... prompt = build_prompt(\"How are you?\")
This API can also be helpful in an \"agent\" context where parts of the prompt are set when the agent is initialized and never modified later. In this situation we can partially apply the prompt function at initialization.
import outlines import functools as ft ... @outlines.prompt ... def solve_task(name: str, objective: str, task: str): ... '''Your name is {{name}}. .. Your overall objective is to {{objective}}. ... Please solve the following task: {{task}} ... ''' ... hal = ft.partial(solve_task, \"HAL\", \"Travel to Jupiter\")
"},{"location":"api/prompts/#outlines.prompts.prompt--returns","title":"Returns","text":"A Prompt
callable class which will render the template when called.
Source code in outlines/prompts.py
def prompt(fn: Callable) -> Prompt:\n \"\"\"Decorate a function that contains a prompt template.\n\n This allows to define prompts in the docstring of a function and simplify their\n manipulation by providing some degree of encapsulation. It uses the `render`\n function internally to render templates.\n\n >>> import outlines\n >>>\n >>> @outlines.prompt\n >>> def build_prompt(question):\n ... \"I have a ${question}\"\n ...\n >>> prompt = build_prompt(\"How are you?\")\n\n This API can also be helpful in an \"agent\" context where parts of the prompt\n are set when the agent is initialized and never modified later. In this situation\n we can partially apply the prompt function at initialization.\n\n >>> import outlines\n >>> import functools as ft\n ...\n >>> @outlines.prompt\n ... def solve_task(name: str, objective: str, task: str):\n ... '''Your name is {{name}}.\n .. Your overall objective is to {{objective}}.\n ... Please solve the following task: {{task}}\n ... '''\n ...\n >>> hal = ft.partial(solve_task, \"HAL\", \"Travel to Jupiter\")\n\n Returns\n -------\n A `Prompt` callable class which will render the template when called.\n\n \"\"\"\n\n signature = inspect.signature(fn)\n\n # The docstring contains the template that will be rendered to be used\n # as a prompt to the language model.\n docstring = fn.__doc__\n if docstring is None:\n raise TypeError(\"Could not find a template in the function's docstring.\")\n\n template = cast(str, docstring)\n\n return Prompt(template, signature)\n
"},{"location":"api/prompts/#outlines.prompts.render","title":"render(template, **values)
","text":"Parse a Jinaj2 template and translate it into an Outlines graph.
This function removes extra whitespaces and linebreaks from templates to allow users to enter prompts more naturally than if they used Python's constructs directly. See the examples for a detailed explanation.
"},{"location":"api/prompts/#outlines.prompts.render--examples","title":"Examples","text":"Outlines follow Jinja2's syntax
import outlines outline = outlines.render(\"I like {{food}} and {{sport}}\", food=\"tomatoes\", sport=\"tennis\") I like tomatoes and tennis
If the first line of the template is empty, render
removes it
from outlines import render
tpl = ''' ... A new string''' tpl ... '\\nA new string' render(tpl) ... 'a new string'
Similarly, render
ignores linebreaks introduced by placing the closing quotes underneath the text:
tpl = ''' ... A new string ... ''' tpl ... '\\nA new string\\n' render(tpl) ... 'A new string'
If you want to insert a linebreak at the end of the rendered template, you will need to leave an empty line at the end of the template:
tpl = ''' ... A new string ... ... ''' tpl ... '\\nA new string\\n\\n' render(tpl) ... 'A new string\\n'
render
removes the identation in docstrings. This is particularly important when using prompt functions
tpl = ''' ... a string ... and another string''' tpl ... '\\n a string\\n and another string' render(tpl) ... 'a string\\nand another string'
The indentation of the first line is assumed to be the same as the second line's
tpl = '''a string ... and another''' tpl ... 'a string\\n and another' render(tpl) ... 'a string\\nand another'
To get a different indentation for the first and the second line, we can start the prompt on the string's second line:
tpl = ''' ... First line ... Second line''' render(tpl) ... 'First Line\\n Second Line'
"},{"location":"api/prompts/#outlines.prompts.render--parameters","title":"Parameters","text":"template A string that contains a template written with the Jinja2 syntax. **values Map from the variables in the template to their value.
"},{"location":"api/prompts/#outlines.prompts.render--returns","title":"Returns","text":"A string that contains the rendered template.
Source code in outlines/prompts.py
def render(template: str, **values: Optional[Dict[str, Any]]) -> str:\n r\"\"\"Parse a Jinaj2 template and translate it into an Outlines graph.\n\n This function removes extra whitespaces and linebreaks from templates to\n allow users to enter prompts more naturally than if they used Python's\n constructs directly. See the examples for a detailed explanation.\n\n Examples\n --------\n\n Outlines follow Jinja2's syntax\n\n >>> import outlines\n >>> outline = outlines.render(\"I like {{food}} and {{sport}}\", food=\"tomatoes\", sport=\"tennis\")\n I like tomatoes and tennis\n\n If the first line of the template is empty, `render` removes it\n\n >>> from outlines import render\n >>>\n >>> tpl = '''\n ... A new string'''\n >>> tpl\n ... '\\nA new string'\n >>> render(tpl)\n ... 'a new string'\n\n Similarly, `render` ignores linebreaks introduced by placing the closing quotes\n underneath the text:\n\n >>> tpl = '''\n ... A new string\n ... '''\n >>> tpl\n ... '\\nA new string\\n'\n >>> render(tpl)\n ... 'A new string'\n\n If you want to insert a linebreak at the end of the rendered template, you will\n need to leave an empty line at the end of the template:\n\n >>> tpl = '''\n ... A new string\n ...\n ... '''\n >>> tpl\n ... '\\nA new string\\n\\n'\n >>> render(tpl)\n ... 'A new string\\n'\n\n `render` removes the identation in docstrings. This is particularly important\n when using prompt functions\n\n >>> tpl = '''\n ... a string\n ... and another string'''\n >>> tpl\n ... '\\n a string\\n and another string'\n >>> render(tpl)\n ... 'a string\\nand another string'\n\n The indentation of the first line is assumed to be the same as the second line's\n\n >>> tpl = '''a string\n ... and another'''\n >>> tpl\n ... 'a string\\n and another'\n >>> render(tpl)\n ... 'a string\\nand another'\n\n To get a different indentation for the first and the second line, we can start the\n prompt on the string's second line:\n\n >>> tpl = '''\n ... First line\n ... Second line'''\n >>> render(tpl)\n ... 'First Line\\n Second Line'\n\n Parameters\n ----------\n template\n A string that contains a template written with the Jinja2 syntax.\n **values\n Map from the variables in the template to their value.\n\n Returns\n -------\n A string that contains the rendered template.\n\n \"\"\"\n jinja_template = create_jinja_template(template)\n return jinja_template.render(**values)\n
"},{"location":"api/regex/","title":"Regex","text":""},{"location":"api/regex/#outlines.generate.regex.regex","title":"regex(model, regex_str, sampler=multinomial())
","text":"Generate structured text in the language of a regular expression.
"},{"location":"api/regex/#outlines.generate.regex.regex--parameters","title":"Parameters","text":"model: An instance of Transformer
that represents a model from the transformers
library. regex_str: The regular expression that the output must follow. sampler: The sampling algorithm to use to generate token ids from the logits distribution.
"},{"location":"api/regex/#outlines.generate.regex.regex--returns","title":"Returns","text":"A SequenceGeneratorAdapter
instance that generates text constrained by the regular expression.
Source code in outlines/generate/regex.py
@singledispatch\ndef regex(model, regex_str: str, sampler: Sampler = multinomial()):\n \"\"\"Generate structured text in the language of a regular expression.\n\n Parameters\n ----------\n model:\n An instance of `Transformer` that represents a model from the\n `transformers` library.\n regex_str:\n The regular expression that the output must follow.\n sampler:\n The sampling algorithm to use to generate token ids from the logits\n distribution.\n\n Returns\n -------\n A `SequenceGeneratorAdapter` instance that generates text constrained by the\n regular expression.\n\n \"\"\"\n from outlines.processors import RegexLogitsProcessor\n\n logits_processor = RegexLogitsProcessor(regex_str, tokenizer=model.tokenizer)\n return SequenceGeneratorAdapter(model, logits_processor, sampler)\n
"},{"location":"api/samplers/","title":"Samplers","text":""},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler","title":"BeamSearchSampler
","text":"Beam Search sampling algorithm.
"},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler--attributes","title":"Attributes","text":"samples The number of samples taken for each input sequence. Equivalent to the number of beams.
Source code in outlines/samplers.py
class BeamSearchSampler:\n \"\"\"Beam Search sampling algorithm.\n\n Attributes\n ----------\n samples\n The number of samples taken for each input sequence. Equivalent to the\n number of beams.\n \"\"\"\n\n def __init__(self, beams: int = 1):\n self.samples = beams\n\n def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n _,\n ) -> Tuple[\"torch.DoubleTensor\", \"torch.DoubleTensor\", \"torch.DoubleTensor\"]:\n \"\"\"Call the beam search sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n logprobs = torch.nn.functional.log_softmax(next_token_logits, dim=-1)\n weights = logprobs + sequence_weights.unsqueeze(1).expand_as(next_token_logits)\n\n # Flatten scores to (n_batch, n_samples * vocab_size)\n # and find the top-k weights for each batch.\n batch_size = next_token_logits.shape[0] // self.samples\n vocab_size = next_token_logits.shape[-1]\n weights = weights.view(batch_size, self.samples * vocab_size)\n\n # If the weights are all equal to 0 we are at the beginning of the search\n # and thus only need to sample from one set of token logits for each\n # batch.\n if torch.all(sequence_weights == 0):\n weights = weights[:, :vocab_size]\n\n weights, indices = torch.topk(\n weights, self.samples, dim=1, largest=True, sorted=True\n )\n\n ancestors = torch.div(indices, vocab_size, rounding_mode=\"floor\")\n next_token_ids = indices % vocab_size\n\n # Re-shape the weights, next_token_ids and ancestors to (n_batch * n_samples, 1)\n first_batch_idx = torch.arange(\n 0, batch_size * self.samples, self.samples, device=next_token_logits.device\n ).unsqueeze(1)\n ancestors = ancestors + first_batch_idx\n\n ancestors = ancestors.view(self.samples * batch_size)\n weights = weights.view(self.samples * batch_size)\n next_token_ids = next_token_ids.view(self.samples * batch_size, 1)\n\n return next_token_ids, ancestors, weights\n\n @property\n def sampling_params(self):\n return SamplingParameters(\"beam_search\", self.samples, None, None, 1.0)\n
"},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler.__call__","title":"__call__(next_token_logits, sequence_weights, _)
","text":"Call the beam search sampler.
"},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler.__call__--parameters","title":"Parameters","text":"next_token_logits A tensor of shape (n_seqs, vocab_size,)
that represents the probability distribution of the next token over the vocabulary. sequence_weights A tensor of shape (n_seqs,)
that represents the cumulative weight of each sequence. rng A random number generator.
"},{"location":"api/samplers/#outlines.samplers.BeamSearchSampler.__call__--returns","title":"Returns","text":"A tuple with an array that contains the ids of the sampled tokens of shape (n_seqs, 1)
, an array that contains the ancestors of each sampled id of shape (n_seqs,)
and an array that contains the updated cumulative weights of each sequence of shape (n_seqs,)
.
Source code in outlines/samplers.py
def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n _,\n) -> Tuple[\"torch.DoubleTensor\", \"torch.DoubleTensor\", \"torch.DoubleTensor\"]:\n \"\"\"Call the beam search sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n logprobs = torch.nn.functional.log_softmax(next_token_logits, dim=-1)\n weights = logprobs + sequence_weights.unsqueeze(1).expand_as(next_token_logits)\n\n # Flatten scores to (n_batch, n_samples * vocab_size)\n # and find the top-k weights for each batch.\n batch_size = next_token_logits.shape[0] // self.samples\n vocab_size = next_token_logits.shape[-1]\n weights = weights.view(batch_size, self.samples * vocab_size)\n\n # If the weights are all equal to 0 we are at the beginning of the search\n # and thus only need to sample from one set of token logits for each\n # batch.\n if torch.all(sequence_weights == 0):\n weights = weights[:, :vocab_size]\n\n weights, indices = torch.topk(\n weights, self.samples, dim=1, largest=True, sorted=True\n )\n\n ancestors = torch.div(indices, vocab_size, rounding_mode=\"floor\")\n next_token_ids = indices % vocab_size\n\n # Re-shape the weights, next_token_ids and ancestors to (n_batch * n_samples, 1)\n first_batch_idx = torch.arange(\n 0, batch_size * self.samples, self.samples, device=next_token_logits.device\n ).unsqueeze(1)\n ancestors = ancestors + first_batch_idx\n\n ancestors = ancestors.view(self.samples * batch_size)\n weights = weights.view(self.samples * batch_size)\n next_token_ids = next_token_ids.view(self.samples * batch_size, 1)\n\n return next_token_ids, ancestors, weights\n
"},{"location":"api/samplers/#outlines.samplers.GreedySampler","title":"GreedySampler
","text":"Greedy Sampling algorithm.
Greedy sampling consists in choosing the token with the largest likelihood at every step.
We don't allow more than one sample. We could attribute this a meaning, for instance the k-th sample represents the k-th most likely token. In which case it would be equivalent to beam search without the sequence weights.
"},{"location":"api/samplers/#outlines.samplers.GreedySampler--attributes","title":"Attributes","text":"samples The number of samples taken for each input sequence.
Source code in outlines/samplers.py
class GreedySampler:\n \"\"\"Greedy Sampling algorithm.\n\n Greedy sampling consists in choosing the token with the largest\n likelihood at every step.\n\n We don't allow more than one sample. We could attribute this a meaning, for\n instance the k-th sample represents the k-th most likely token. In which\n case it would be equivalent to beam search without the sequence weights.\n\n Attributes\n ----------\n samples\n The number of samples taken for each input sequence.\n\n \"\"\"\n\n def __init__(self):\n self.samples = 1\n\n def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n _,\n ) -> \"torch.DoubleTensor\":\n \"\"\"Call the greedy sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n logprobs = torch.nn.functional.log_softmax(next_token_logits, dim=-1)\n next_token_ids = torch.argmax(logprobs, dim=-1, keepdim=True)\n\n ancestors = torch.arange(\n next_token_logits.shape[0], device=next_token_logits.device\n )\n weights = sequence_weights + torch.gather(logprobs, 1, next_token_ids).squeeze()\n\n return next_token_ids, ancestors, weights\n\n @property\n def sampling_params(self):\n return SamplingParameters(\"greedy\", self.samples, None, None, 0.0)\n
"},{"location":"api/samplers/#outlines.samplers.GreedySampler.__call__","title":"__call__(next_token_logits, sequence_weights, _)
","text":"Call the greedy sampler.
"},{"location":"api/samplers/#outlines.samplers.GreedySampler.__call__--parameters","title":"Parameters","text":"next_token_logits A tensor of shape (n_seqs, vocab_size,)
that represents the probability distribution of the next token over the vocabulary. sequence_weights A tensor of shape (n_seqs,)
that represents the cumulative weight of each sequence. rng A random number generator.
"},{"location":"api/samplers/#outlines.samplers.GreedySampler.__call__--returns","title":"Returns","text":"A tuple with an array that contains the ids of the sampled tokens of shape (n_seqs, 1)
, an array that contains the ancestors of each sampled id of shape (n_seqs,)
and an array that contains the updated cumulative weights of each sequence of shape (n_seqs,)
.
Source code in outlines/samplers.py
def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n _,\n) -> \"torch.DoubleTensor\":\n \"\"\"Call the greedy sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n logprobs = torch.nn.functional.log_softmax(next_token_logits, dim=-1)\n next_token_ids = torch.argmax(logprobs, dim=-1, keepdim=True)\n\n ancestors = torch.arange(\n next_token_logits.shape[0], device=next_token_logits.device\n )\n weights = sequence_weights + torch.gather(logprobs, 1, next_token_ids).squeeze()\n\n return next_token_ids, ancestors, weights\n
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler","title":"MultinomialSampler
","text":"Multinomial sampling algorithm.
Multinomial sampling consists in randomly sampling the next token assuming its distribution is a Categorical distribution parametrized by the next-token logits.
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler--attributes","title":"Attributes","text":"samples The number of samples taken for each input sequence.
Source code in outlines/samplers.py
class MultinomialSampler:\n \"\"\"Multinomial sampling algorithm.\n\n Multinomial sampling consists in randomly sampling the next token assuming\n its distribution is a Categorical distribution parametrized by the\n next-token logits.\n\n\n Attributes\n ----------\n samples\n The number of samples taken for each input sequence.\n\n \"\"\"\n\n def __init__(\n self,\n samples: int = 1,\n *,\n top_k: Optional[int] = None,\n top_p: Optional[float] = None,\n temperature: Optional[float] = None,\n ):\n self.samples = samples\n self.top_k = top_k\n self.top_p = top_p\n self.temperature = temperature\n\n self.logits_processors = []\n if top_k is not None:\n self.logits_processors.append(keep_top_k_logits(top_k))\n elif top_p is not None:\n self.logits_processors.append(keep_top_p_logits(top_p))\n\n if temperature is not None:\n self.logits_processors.append(rescale_logits(temperature))\n\n def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n rng: \"torch.Generator\",\n ) -> Tuple[\"torch.DoubleTensor\", \"torch.DoubleTensor\", \"torch.DoubleTensor\"]:\n \"\"\"Call the multinomial sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n altered_next_token_logits = next_token_logits\n for logit_processor in self.logits_processors:\n altered_next_token_logits = logit_processor(next_token_logits)\n\n probs = torch.nn.functional.softmax(altered_next_token_logits, dim=-1)\n next_token_ids = torch.multinomial(probs, num_samples=1, generator=rng)\n\n logprobs = torch.nn.functional.log_softmax(altered_next_token_logits, dim=-1)\n ancestors = torch.arange(\n altered_next_token_logits.shape[0], device=next_token_logits.device\n )\n weights = sequence_weights + torch.gather(logprobs, 1, next_token_ids).squeeze()\n\n return next_token_ids, ancestors, weights\n\n @property\n def sampling_params(self):\n return SamplingParameters(\n \"multinomial\",\n self.samples,\n self.top_p,\n self.top_k,\n self.temperature,\n )\n
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler.__call__","title":"__call__(next_token_logits, sequence_weights, rng)
","text":"Call the multinomial sampler.
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler.__call__--parameters","title":"Parameters","text":"next_token_logits A tensor of shape (n_seqs, vocab_size,)
that represents the probability distribution of the next token over the vocabulary. sequence_weights A tensor of shape (n_seqs,)
that represents the cumulative weight of each sequence. rng A random number generator.
"},{"location":"api/samplers/#outlines.samplers.MultinomialSampler.__call__--returns","title":"Returns","text":"A tuple with an array that contains the ids of the sampled tokens of shape (n_seqs, 1)
, an array that contains the ancestors of each sampled id of shape (n_seqs,)
and an array that contains the updated cumulative weights of each sequence of shape (n_seqs,)
.
Source code in outlines/samplers.py
def __call__(\n self,\n next_token_logits: \"torch.DoubleTensor\",\n sequence_weights: \"torch.DoubleTensor\",\n rng: \"torch.Generator\",\n) -> Tuple[\"torch.DoubleTensor\", \"torch.DoubleTensor\", \"torch.DoubleTensor\"]:\n \"\"\"Call the multinomial sampler.\n\n Parameters\n ----------\n next_token_logits\n A tensor of shape ``(n_seqs, vocab_size,)`` that represents the\n probability distribution of the next token over the vocabulary.\n sequence_weights\n A tensor of shape ``(n_seqs,)`` that represents the cumulative\n weight of each sequence.\n rng\n A random number generator.\n\n Returns\n -------\n A tuple with an array that contains the ids of the sampled tokens of\n shape ``(n_seqs, 1)``, an array that contains the ancestors of each\n sampled id of shape ``(n_seqs,)`` and an array that contains the updated\n cumulative weights of each sequence of shape ``(n_seqs,)``.\n\n \"\"\"\n import torch\n\n altered_next_token_logits = next_token_logits\n for logit_processor in self.logits_processors:\n altered_next_token_logits = logit_processor(next_token_logits)\n\n probs = torch.nn.functional.softmax(altered_next_token_logits, dim=-1)\n next_token_ids = torch.multinomial(probs, num_samples=1, generator=rng)\n\n logprobs = torch.nn.functional.log_softmax(altered_next_token_logits, dim=-1)\n ancestors = torch.arange(\n altered_next_token_logits.shape[0], device=next_token_logits.device\n )\n weights = sequence_weights + torch.gather(logprobs, 1, next_token_ids).squeeze()\n\n return next_token_ids, ancestors, weights\n
"},{"location":"api/samplers/#outlines.samplers.SamplingParameters","title":"SamplingParameters
dataclass
","text":"Sampling parameters available in Outlines.
Source code in outlines/samplers.py
@dataclass(frozen=True)\nclass SamplingParameters:\n \"\"\"Sampling parameters available in Outlines.\"\"\"\n\n sampler: str\n num_samples: int = 1\n top_p: Optional[float] = None\n top_k: Optional[int] = None\n temperature: Optional[float] = None\n
"},{"location":"api/samplers/#outlines.samplers.keep_top_k_logits","title":"keep_top_k_logits(k)
","text":"Build a function that masks logits values smaller than the top k
ones.
"},{"location":"api/samplers/#outlines.samplers.keep_top_k_logits--parameters","title":"Parameters","text":"k The ranking below which logit values are replaced by -math.inf
.
Source code in outlines/samplers.py
def keep_top_k_logits(k: int) -> Callable[[\"torch.Tensor\"], \"torch.Tensor\"]:\n \"\"\"Build a function that masks logits values smaller than the top `k` ones.\n\n Parameters\n ----------\n k\n The ranking below which logit values are replaced by `-math.inf`.\n\n \"\"\"\n import torch\n\n if not isinstance(k, int) or k < 1:\n raise ValueError(f\"`k` must be a strictly positive integers, got {k} instead.\")\n\n def logits_processor(logits: torch.Tensor) -> torch.Tensor:\n num_to_keep = min(k, logits.size(-1))\n mask_idx = logits < torch.topk(logits, num_to_keep)[0][..., -1, None]\n return logits.masked_fill(mask_idx, -math.inf)\n\n return logits_processor\n
"},{"location":"api/samplers/#outlines.samplers.keep_top_p_logits","title":"keep_top_p_logits(p)
","text":"Build a function that masks the lowest probability tokens whose cumulative probability is below a certain threshold.
"},{"location":"api/samplers/#outlines.samplers.keep_top_p_logits--parameters","title":"Parameters","text":"p The value of the threshold. We keep the highest probability tokens whose cumulative distribution is greater than or equal to p
and mask the others. Its value must be between 0 (excluded) and 1 (included).
Source code in outlines/samplers.py
def keep_top_p_logits(p: float) -> Callable[[\"torch.Tensor\"], \"torch.Tensor\"]:\n \"\"\"Build a function that masks the lowest probability tokens whose\n cumulative probability is below a certain threshold.\n\n Parameters\n ----------\n p\n The value of the threshold. We keep the highest probability tokens whose\n cumulative distribution is greater than or equal to `p` and mask the\n others. Its value must be between 0 (excluded) and 1 (included).\n\n \"\"\"\n import torch\n\n if p <= 0.0 or p > 1.0:\n raise ValueError(\n f\"`p` must be a floating point number between 0 (excluded) and 1 (included), got {p} instead.\"\n )\n\n def logits_processor(logits: torch.Tensor) -> torch.Tensor:\n sorted_logits, sorted_idx = torch.sort(logits, descending=False)\n cumulative_probabilties = torch.nn.functional.softmax(\n sorted_logits, dim=-1\n ).cumsum(dim=-1)\n\n sorted_masked_idx = cumulative_probabilties <= (1 - p)\n mask_idx = torch.scatter(sorted_masked_idx, 1, sorted_idx, sorted_masked_idx)\n return logits.masked_fill(mask_idx, -math.inf)\n\n return logits_processor\n
"},{"location":"api/samplers/#outlines.samplers.rescale_logits","title":"rescale_logits(temperature)
","text":"Build a function that rescales the token probabilities exponentially.
"},{"location":"api/samplers/#outlines.samplers.rescale_logits--parameters","title":"Parameters","text":"temperature The value by which we rescale the logits.
Source code in outlines/samplers.py
def rescale_logits(temperature: float) -> Callable[[\"torch.Tensor\"], \"torch.Tensor\"]:\n \"\"\"Build a function that rescales the token probabilities exponentially.\n\n Parameters\n ----------\n temperature\n The value by which we rescale the logits.\n\n \"\"\"\n\n if not isinstance(temperature, float) or temperature < 0.0:\n raise ValueError(\n f\"`temperature` must be a strictly positive floating point number, got {temperature} instead.\"\n )\n elif temperature == 0.0:\n raise ValueError(\n \"Please use the greedy sampler instead of setting the temperature to 0.\"\n )\n\n def logits_processor(logits: \"torch.Tensor\") -> \"torch.Tensor\":\n return logits / temperature\n\n return logits_processor\n
"},{"location":"blog/","title":"Blog","text":""},{"location":"blog/2024/01/10/roadmap-for-2024/","title":"Roadmap for 2024","text":"Outlines is not even one year old and it's already gone a long way! As we just reached 4000 stars, and before laying out the roadmap for the following year, we would like to pause and thank all of you for supporting us, using and contributing to the library!
"},{"location":"blog/2024/01/10/roadmap-for-2024/#thoughts","title":"Thoughts","text":"Before delving into the detailed roadmap, let me share a few thoughts and explain the general direction of the library. These thoughts are informed with my multiple interactions with users, either on Twitter or in our Discord server.
Outlines currently differentiates itself from other libraries with its efficient JSON- and regex- constrained generation. A user-facing interface for grammar-structured generation (it had been hidden in the repository) was also recently added. But there is much more we can do along these lines. In 2024 will we will keep pushing in the direction of more accurate, faster constrained generation.
Outlines also supports many models providers: transformers
, mamba
, llama.cpp
and exllama2
. Those integrations represent a lot of maintenance, and we will need to simplify them. For instance, transformers
now supports quantized models, and we will soon deprecate the support for autoawq
and autogptq
. Thanks to a refactor of the library, it is now possible to use our constrained generation method by using logits processor with all other libraries, except mamba
. We will look for libraries that provide state-space models and allow to pass a logits processor during inference. We will interface with llama.cpp
and exllama2
using logits processors.
We would like expand our work to the whole sampling layer, and add new sampling methods that should make structured generation more accurate. This means we will keep the transformers
integration as it is today and will expand our text generation logic around this library.
Making workflows re-usable and easy to share is difficult today. That is why we are big believers in outlines functions. We will keep improving the interface and adding examples.
Finally, we want to add a CLI tool, outlines serve
. This will allows you to either serve an API that does general constrained generation, or to serve Outlines function.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#detailed-roadmap","title":"Detailed roadmap","text":"Here is a more detailed roadmap for the next 12 months. Outlines is a community effort, and we invite you to pick either topic and contribute to the library. I will progressively add related issues in the repository.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#many-more-examples-and-tutorials","title":"Many more examples and tutorials","text":"Let's be honest, Outlines is lacking clear and thorough examples. We want to change this!
- How does Outlines work? What can you do with it?
- What can you do with Outlines that is harder or impossible to do with other libraries?
- How you can perform standard LLM workflows, for instance Chain of Thoughts, Tree of Thoughts, etc?
- How does Oultines integrates with the larger ecosystem, for instance other libraries like LangChain and LlamaIndex?
"},{"location":"blog/2024/01/10/roadmap-for-2024/#simplify-the-integrations","title":"Simplify the integrations","text":"We want to keep the current integrations but lower the maintenance cost so we can focus on what we bring to the table.
- Deprecate every obsolete integration:
transformers
has recently integrated autoawq
and autogptq
for instance. (PR) - See if we can integrate to a library that provides state-space models via a logit processing function;
- Integrate with llama.cpp via a logits processor;
- Integrate with exllamav2 via a logits processor;
"},{"location":"blog/2024/01/10/roadmap-for-2024/#push-structured-generation-further","title":"Push structured generation further","text":"We're just getting started!
- Improve the performance of existing structured generation algorithms;
- Improve the correctness of structured generation algorithms;
- Add ready-to-use grammars in the grammars repository or in a submodule in Outlines.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#keep-developing-outlines-functions","title":"Keep developing Outlines functions","text":"Functions are awesome, use them!
- Implement a CLI
outlines serve
that allows to serve Outlines functions locally; - Add more functions to the functions repository.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#serve-structured-generation","title":"Serve structured generation","text":"We want to make it easier to serve structured generation and outlines functions.
- Implement the outlines serve CLI
outlines serve
- Serve local APIs that perform structured generation;
- Serve Outlines functions.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#improve-the-generation-layer","title":"Improve the generation layer","text":" - Use
transformers
's private API to prepare inputs for generation inside the Transformers
class; - Support successions of model generation and text infilling for methods like Beam Search and SMC;
- Differentiate by adding new caching methods: attention sink, trie-based caching, etc;
- Differentiate by implementing SMC;
- Implement Beam Search;
- Add token healing.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#a-more-seamless-integration-with-openai","title":"A more seamless integration with OpenAI","text":" - Provide the same user interface for OpenAI and open source models so they are easily interchangeable;
- Integrate the function calling API.
"},{"location":"blog/2024/01/10/roadmap-for-2024/#last-word","title":"Last word","text":"This roadmap was influenced by the expressed interests of the community. If it doesn't reflect your needs please come and share your experience with us.
"},{"location":"community/","title":"Community","text":"Outlines exists for a community of users who believe software doesn't need to be complicated. Who share the same passion for Large Language Models but don't want to compromise on robustness. Together, we are bringing these powerful models back to the world of software.
"},{"location":"community/#connect-on-discord","title":"Connect on Discord","text":"The Outlines community lives on our Discord server. There you can ask questions, share ideas or just chat with people like you. Don't be a stranger and join us.
"},{"location":"community/contribute/","title":"Contribute","text":""},{"location":"community/contribute/#what-contributions","title":"What contributions?","text":" - Documentation contributions are very valuable to us!
- Examples. Show us what you did with Outlines :)
- Bug reports with a minimum working examples in the issue tracker
- Bug fixes are always a pleasure to review.
- New features. Please start a new discussion, or come chat with us beforehand!
Note that the issue tracker is only intended for actionable items. In doubt, open a discussion or come talk to us.
"},{"location":"community/contribute/#how-to-contribute","title":"How to contribute?","text":""},{"location":"community/contribute/#setup","title":"Setup","text":"First, fork the repository on GitHub and clone the fork locally:
git clone git@github.com/YourUserName/outlines.git\ncd outlines\n
Create a new virtual environment. If you are using conda:
conda env create -f environment.yml\n
If you are using venv:
python -m venv .venv\nsource .venv/bin/activate\n
Then install the dependencies in editable mode, and install the pre-commit hooks:
pip install -e \".[test]\"\npre-commit install\n
"},{"location":"community/contribute/#before-pushing-your-code","title":"Before pushing your code","text":"Run the tests:
pytest\n
And run the code style checks:
pre-commit run --all-files\n
"},{"location":"community/contribute/#benchmarking","title":"Benchmarking","text":"Outlines uses asv for automated benchmark testing. Benchmarks are run automatically before pull requests are merged to prevent performance degredation.
You can run the benchmark test suite locally with the following command:
asv run --config benchmarks/asv.conf.json\n
Caveats: - If you're on a device with CUDA, you must add the argument --launch-method spawn
- Uncommitted code will not be benchmarked, you must first commit your changes.
"},{"location":"community/contribute/#run-a-specific-test","title":"Run a specific test:","text":"asv run --config benchmarks/asv.conf.json -b bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm\n
"},{"location":"community/contribute/#profile-a-specific-test","title":"Profile a specific test:","text":"asv run --config benchmarks/asv.conf.json --profile -b bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm\n
"},{"location":"community/contribute/#compare-to-originmain","title":"Compare to origin/main
","text":"get fetch origin\nasv continuous origin/main HEAD --config benchmarks/asv.conf.json\n
"},{"location":"community/contribute/#asv-pr-behavior","title":"ASV PR Behavior","text":" - View ASV Benchmark Results: Open the workflow, view
BENCHMARK RESULTS
section. - Merging is blocked unless benchmarks are run for the latest commit.
- Benchmarks fail if performance degrades by more than 10% for any individual benchmark.
- The \"Benchmark PR\" workflow runs when its manually dispatched, or if the
run_benchmarks
label is added to the PR they run for every commit.
"},{"location":"community/contribute/#contribute-to-the-documentation","title":"Contribute to the documentation","text":"To work on the documentation you will need to install the related dependencies:
pip install -r requirements-doc.txt\n
To build the documentation and serve it locally, run the following command in the repository's root folder:
mkdocs serve\n
By following the instruction you will be able to view the documentation locally. It will be updated every time you make a change.
"},{"location":"community/contribute/#open-a-pull-request","title":"Open a Pull Request","text":"Create a new branch on your fork, commit and push the changes:
git checkout -b new-branch\ngit add .\ngit commit -m \"Changes I made\"\ngit push origin new-branch\n
Then you can open a pull request on GitHub. It should prompt you to do so. Every subsequent change that you make on your branch will update the pull request.
Do not hesitate to open a draft PR before your contribution is ready, especially if you have questions and/or need feedback. If you need help, come tell us on Discord.
"},{"location":"community/examples/","title":"Community projects and articles","text":"Publishing examples and articles about Outlines are a meaningful way to contrinute to the community. Here is a list of projects we are aware of. Drop us a line if we forgot yours!
MMSG is a Python library for generating interleaved text and image content in a structured format you can directly pass to downstream APIs.
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report shows that Structured Generation can outperform finetuning, and maybe even multimodality, in document-image understanding tasks as part of CVPR's 2nd MMFM Challenge.
Chess LLM Arena is a HuggingFace Space where you can make LLMs compete in a chess match.
LLM Data Gen is a HuggingFace Space that generates synthetic dataset files in JSONLines format.
Fast, High-Fidelity LLM Decoding with Regex Constraints presents an efficient alternative to Outlines's structured generation.
gigax is an Open-Source library that allows to create real-time LLM-powered NPCs for video games.
Improving Prompt Consistency with Structured Generations shows how structured generation can improve consistency of evaluation runs by reducing sensitivity to changes in prompt format.
AskNews is a news curation service processing 300k news articles per day in a structured way, with Outlines.
"},{"location":"community/feedback/","title":"Feedback","text":"If Outlines has been helpful to you, let us know on Discord or give us a shoutout on Twitter! It's always heartwarming \u2764\ufe0f
I am once again reminding you that structured extraction using LLMs is going to transform every single industry in the next 10 years https://t.co/xQ3tcWnrZ8
\u2014 Sam Hogan (@0xSamHogan) April 17, 2024 outline's growth is insane, using is an understatement! https://t.co/rHCNWhZdCs
\u2014 jason liu (@jxnlco) April 17, 2024 Outlines is an amazing lib and more popular than @remilouf\u2019s modesty will admit. https://t.co/DfHbMPIlX1 https://t.co/mDHIWJrD0C
\u2014 Delip Rao e/\u03c3 (@deliprao) April 18, 2024 Impressive implementation of a true regex / json / grammar guided text generation pic.twitter.com/RX5RVYaVIx
\u2014 Rohan Paul (@rohanpaul_ai) December 30, 2023 Most underrated Github Repo in AI + LLM JSON guided Generation: https://t.co/lSB8KIet1H
\u2014 \ud83c\udf99Jean-Louis Queguiner (@JiliJeanlouis) December 18, 2023 Nice and useful. https://t.co/LX72AE0lgt
\u2014 Dan Roy (@roydanroy) August 15, 2023 HUGE dub for open source AI https://t.co/bYKuiEUZ1j
\u2014 kenneth \ud83d\udd87 (@k3nnethfrancis) August 15, 2023 This is amazing - glad to see more outp guidance modules! Will try this out soon I'm wondering how they translate from regex automatons to token boundariesAlso why Open Source will succeed. Even today I don't see any guided output functionality from the big providers. https://t.co/Ity2H25Klf
\u2014 Hrishi (@hrishioa) August 14, 2023 Outlines - a library to help LLM developers guide text generation in a fast and reliable way.\"Provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema.\"Need to check this out. Reliable JSON output is a common use\u2026 pic.twitter.com/Bkbh8vKogN
\u2014 elvis (@omarsar0) August 14, 2023 Woah this is cool! Makes open source models more usable.Give any LLM Function Call capability (and more) with Outlines: https://t.co/PtPykR5ZGR https://t.co/RRQjWHnIxv pic.twitter.com/BwNnH8SMwv
\u2014 Yohei (@yoheinakajima) August 14, 2023 This is awesome! Being able to guarantee the output's structure unblocks so many applications. This is a great milestone and a fundamental building block for more advanced AI apps. https://t.co/WdwMOc7hE8
\u2014 Guilherme Castro (@skastr052) August 15, 2023 Juggling with the unpredictable outputs of ChatGPT API lately while building my product. \ud83d\ude13 Tried prompt engineering to channel its wisdom into a neat JSON, but it's like asking a cat to fetch. \ud83d\udc31Luckily, stumbled upon \"Outlines\" \u2013 looks like a promising way to tame the LLM\u2026 pic.twitter.com/oYQ6q8exAS
\u2014 Charlie (@14435635Sun) August 15, 2023 A complex system of LLM input-outputs interacting with non-LLM agents and models benefits immeasurably from structured outputs. The outlines package saves so much time, https://t.co/NhVQ6NpKDR
\u2014 Amir Sani (@amirsani) November 26, 2023"},{"location":"community/feedback/#let-us-know","title":"Let us know!","text":"We highly value the insights of our users, and we would love to hear from you. If you are using Outlines for your projects and would like to share your experience with us, let's connect:
- What are you building with it?
- What do you like about it?
- What challenges are you facing?
- What do you think could be improved?
To schedule an appointment follow this link. This is exclusively intended to share your experience, please go on Discord or GitHub for support.
"},{"location":"community/versioning/","title":"Versioning Guide","text":"The Outlines project follows a structured versioning scheme designed to provide clarity and minimize risk for downstream dependents.
Each part of the version number (major.minor.patch
) conveys information about the nature and impact of the changes included in the release.
- Major Releases includes compatibility-breaking changes to core interfaces, such as
LogitsProcessor
s and Guides
. - Minor Releases introduce changes of substance to internal or unexposed functionality. These changes are well tested and intended to maintain compatability with existing use of core interfaces.
- Patch Releases address bug fixes and incorporate low-risk changes to improve stability and performance.
"},{"location":"community/versioning/#releases","title":"Releases","text":"Releases along with release notes can be found on the Outlines Releases GitHub Page.
"},{"location":"community/versioning/#version-pinning-recommendations","title":"Version Pinning Recommendations","text":"Here are our recommendations for managing dependencies on the Outlines package:
Small, Risk-Tolerant Projects: Pin to a specific major version.
Large, Conservative Projects: Pin to a specific minor version.
"},{"location":"cookbook/","title":"Examples","text":"This part of the documentation provides a few cookbooks that you can browse to get acquainted with the library and get some inspiration about what you could do with structured generation. Remember that you can easily change the model that is being used!
- Classification: Classify customer requests.
- Named Entity Extraction: Extract information from pizza orders.
- Dating Profile: Build dating profiles from descriptions using prompt templating and JSON-structured generation.
- Chain Of Density: Summarize documents using chain of density prompting and JSON-structured generation.
- Playing Chess: Make Phi-3 Mini play chess against itself using regex-structured generation.
- SimToM: Improve LLMs' Theory of Mind capabilities with perspective-taking prompting and JSON-structured generation.
- Q&A with Citations: Answer questions and provide citations using JSON-structured generation.
- Knowledge Graph Generation: Generate a Knowledge Graph from unstructured text using JSON-structured generation.
- Chain Of Thought (CoT): Generate a series of intermediate reasoning steps using regex-structured generation.
- ReAct Agent: Build an agent with open weights models using regex-structured generation.
- Earnings reports to CSV: Extract data from earnings reports to CSV using regex-structured generation.
- Vision-Language Models: Use Outlines with vision-language models for tasks like image captioning and visual reasoning.
- Receipt Digitization: Extract information from a picture of a receipt using structured generation.
- Structured Generation from PDFs: Use Outlines with vision-language models to read PDFs and produce structured output.
"},{"location":"cookbook/atomic_caption/","title":"Vision-Language Models with Outlines","text":"This guide demonstrates how to use Outlines with vision-language models, leveraging the new transformers_vision module. Vision-language models can process both text and images, allowing for tasks like image captioning, visual question answering, and more.
We will be using the Pixtral-12B model from Mistral to take advantage of some of its visual reasoning capabilities and a workflow to generate a multistage atomic caption.
"},{"location":"cookbook/atomic_caption/#setup","title":"Setup","text":"First, we need to install the necessary dependencies. In addition to Outlines, we'll need to install the transformers library and any specific requirements for the vision-language model we'll be using.
pip install outlines transformers torch\n
"},{"location":"cookbook/atomic_caption/#initializing-the-model","title":"Initializing the Model","text":"We'll use the transformers_vision function to initialize our vision-language model. This function is specifically designed to handle models that can process both text and image inputs. Today we'll be using the Pixtral model with the llama tokenizer. (Currently the mistral tokenizer is pending support).
import torch\nfrom transformers import (\n LlavaForConditionalGeneration,\n)\nmodel_name=\"mistral-community/pixtral-12b\" # original magnet model is able to be loaded without issue\nmodel_class=LlavaForConditionalGeneration\n\ndef get_vision_model(model_name: str, model_class: VisionModel):\n model_kwargs = {\n \"torch_dtype\": torch.bfloat16,\n \"attn_implementation\": \"flash_attention_2\",\n \"device_map\": \"auto\",\n }\n processor_kwargs = {\n \"device\": \"cuda\",\n }\n\n model = outlines.models.transformers_vision(\n model.model_name,\n model_class=model.model_class,\n model_kwargs=model_kwargs,\n processor_kwargs=processor_kwargs,\n )\n return model\nmodel = get_vision_model(model_name, model_class)\n
"},{"location":"cookbook/atomic_caption/#defining-the-schema","title":"Defining the Schema","text":"Next, we'll define a schema for the output we expect from our vision-language model. This schema will help structure the model's responses.
from pydantic import BaseModel, Field, confloat, constr\nfrom pydantic.types import StringConstraints, PositiveFloat\nfrom typing import List\nfrom typing_extensions import Annotated\n\nfrom enum import StrEnum\nclass TagType(StrEnum):\n ENTITY = \"Entity\"\n RELATIONSHIP = \"Relationship\"\n STYLE = \"Style\"\n ATTRIBUTE = \"Attribute\"\n COMPOSITION = \"Composition\"\n CONTEXTUAL = \"Contextual\"\n TECHNICAL = \"Technical\"\n SEMANTIC = \"Semantic\"\n\nclass ImageTag(BaseModel):\n tag: Annotated[\n constr(min_length=1, max_length=30),\n Field(\n description=(\n \"Descriptive keyword or phrase representing the tag.\"\n )\n )\n ]\n category: TagType\n confidence: Annotated[\n confloat(le=1.0),\n Field(\n description=(\n \"Confidence score for the tag, between 0 (exclusive) and 1 (inclusive).\"\n )\n )\n ]\n\nclass ImageData(BaseModel):\n tags_list: List[ImageTag] = Field(..., min_items=8, max_items=20)\n short_caption: Annotated[str, StringConstraints(min_length=10, max_length=150)]\n dense_caption: Annotated[str, StringConstraints(min_length=100, max_length=2048)]\n\nimage_data_generator = outlines.generate.json(model, ImageData)\n
This schema defines the structure for image tags, including categories like Entity, Relationship, Style, etc., as well as short and dense captions.
"},{"location":"cookbook/atomic_caption/#preparing-the-prompt","title":"Preparing the Prompt","text":"We'll create a prompt that instructs the model on how to analyze the image and generate the structured output:
pixtral_instruction = \"\"\"\n<s>[INST]\n<Task>You are a structured image analysis agent. Generate comprehensive tag list, caption, and dense caption for an image classification system.</Task>\n<TagCategories requirement=\"You should generate a minimum of 1 tag for each category.\" confidence=\"Confidence score for the tag, between 0 (exclusive) and 1 (inclusive).\">\n- Entity : The content of the image, including the objects, people, and other elements.\n- Relationship : The relationships between the entities in the image.\n- Style : The style of the image, including the color, lighting, and other stylistic elements.\n- Attribute : The most important attributes of the entities and relationships in the image.\n- Composition : The composition of the image, including the arrangement of elements.\n- Contextual : The contextual elements of the image, including the background, foreground, and other elements.\n- Technical : The technical elements of the image, including the camera angle, lighting, and other technical details.\n- Semantic : The semantic elements of the image, including the meaning of the image, the symbols, and other semantic details.\n<Examples note=\"These show the expected format as an abstraction.\">\n{\n \"tags_list\": [\n {\n \"tag\": \"subject 1\",\n \"category\": \"Entity\",\n \"confidence\": 0.98\n },\n {\n \"tag\": \"subject 2\",\n \"category\": \"Entity\",\n \"confidence\": 0.95\n },\n {\n \"tag\": \"subject 1 runs from subject 2\",\n \"category\": \"Relationship\",\n \"confidence\": 0.90\n },\n }\n</Examples>\n</TagCategories>\n<ShortCaption note=\"The short caption should be a concise single sentence caption of the image content with a maximum length of 100 characters.\">\n<DenseCaption note=\"The dense caption should be a descriptive but grounded narrative paragraph of the image content with high quality narrative prose. It should incorporate elements from each of the tag categories to provide a broad dense caption\">\\n[IMG][/INST]\n\"\"\".strip()\n
This prompt provides detailed instructions to the model on how to generate comprehensive tag lists, captions, and dense captions for image analysis. Because of the ordering of the instructions the original tag generation serves as a sort of visual grounding for the captioning task, reducing the amount of manual post processing required.
"},{"location":"cookbook/atomic_caption/#generating-structured-output","title":"Generating Structured Output","text":"Now we can use our model to generate structured output based on an input image:
def img_from_url(url):\n img_byte_stream = BytesIO(urlopen(url).read())\n return Image.open(img_byte_stream).convert(\"RGB\")\n\nimage_url=\"https://upload.wikimedia.org/wikipedia/commons/9/98/Aldrin_Apollo_11_original.jpg\"\nimage= img_from_url(image_url)\nresult = image_data_generator(\n pixtral_instruction,\n [image]\n)\nprint(result)\n
This code loads an image from a URL, passes it to our vision-language model along with the instruction prompt, and generates a structured output based on the defined schema. We end up with an output like this, ready to be used for the next stage in your pipeline:
{'tags_list': [{'tag': 'astronaut',\n 'category': <TagType.ENTITY: 'Entity'>,\n 'confidence': 0.99},\n {'tag': 'moon', 'category': <TagType.ENTITY: 'Entity'>, 'confidence': 0.98},\n {'tag': 'space suit',\n 'category': <TagType.ATTRIBUTE: 'Attribute'>,\n 'confidence': 0.97},\n {'tag': 'lunar module',\n 'category': <TagType.ENTITY: 'Entity'>,\n 'confidence': 0.95},\n {'tag': 'shadow of astronaut',\n 'category': <TagType.COMPOSITION: 'Composition'>,\n 'confidence': 0.95},\n {'tag': 'footprints in moon dust',\n 'category': <TagType.CONTEXTUAL: 'Contextual'>,\n 'confidence': 0.93},\n {'tag': 'low angle shot',\n 'category': <TagType.TECHNICAL: 'Technical'>,\n 'confidence': 0.92},\n {'tag': 'human first steps on the moon',\n 'category': <TagType.SEMANTIC: 'Semantic'>,\n 'confidence': 0.95}],\n 'short_caption': 'First man on the Moon',\n 'dense_caption': \"The figure clad in a pristine white space suit, emblazoned with the American flag, stands powerfully on the moon's desolate and rocky surface. The lunar module, a workhorse of space engineering, looms in the background, its metallic legs sinking slightly into the dust where footprints and tracks from the mission's journey are clearly visible. The photograph captures the astronaut from a low angle, emphasizing his imposing presence against the desolate lunar backdrop. The stark contrast between the blacks and whiteslicks of lost light and shadow adds dramatic depth to this seminal moment in human achievement.\"}\n
"},{"location":"cookbook/atomic_caption/#conclusion","title":"Conclusion","text":"The transformers_vision module in Outlines provides a powerful way to work with vision-language models. It allows for structured generation of outputs that combine image analysis with natural language processing, opening up possibilities for complex tasks like detailed image captioning, visual question answering, and more.
By leveraging the capabilities of models like Pixtral-12B and the structured output generation of Outlines, you can create sophisticated applications that understand and describe visual content in a highly structured and customizable manner.
"},{"location":"cookbook/chain_of_density/","title":"Summarize documents using Chain of Density prompting","text":"A good summary should be informative, concise and clear. While large language models are generally good at summarizing documents, their summaries tend to be long and contain redundant information; their information density tends to be on the lower end. This is where chain of Density, a new prompting technique, comes in. In this example we will show how one can implement chain of density with a few lines of code using Outlines, leveraging both Outline's prompt templating and its structured generation capabilities.
The article we will try to summarize is the first three paragraphs of the Alan Turing page on Wikipedia:
article = \"\"\"\nAlan Mathison Turing OBE FRS (/\u02c8tj\u028a\u0259r\u026a\u014b/; 23 June 1912 \u2013 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.[5] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer.[6][7][8] He is widely considered to be the father of theoretical computer science and artificial intelligence.[9]\n\nBorn in Maida Vale, London, Turing was raised in southern England. He graduated at King's College, Cambridge, with a degree in mathematics. Whilst he was a fellow at Cambridge, he published a proof demonstrating that some purely mathematical yes\u2013no questions can never be answered by computation. He defined a Turing machine and proved that the halting problem for Turing machines is undecidable. In 1938, he obtained his PhD from the Department of Mathematics at Princeton University. During the Second World War, Turing worked for the Government Code and Cypher School at Bletchley Park, Britain's codebreaking centre that produced Ultra intelligence. For a time he led Hut 8, the section that was responsible for German naval cryptanalysis. Here, he devised a number of techniques for speeding the breaking of German ciphers, including improvements to the pre-war Polish bomba method, an electromechanical machine that could find settings for the Enigma machine. Turing played a crucial role in cracking intercepted coded messages that enabled the Allies to defeat the Axis powers in many crucial engagements, including the Battle of the Atlantic.[10][11]\n\nAfter the war, Turing worked at the National Physical Laboratory, where he designed the Automatic Computing Engine, one of the first designs for a stored-program computer. In 1948, Turing joined Max Newman's Computing Machine Laboratory at the Victoria University of Manchester, where he helped develop the Manchester computers[12] and became interested in mathematical biology. He wrote a paper on the chemical basis of morphogenesis[1] and predicted oscillating chemical reactions such as the Belousov\u2013Zhabotinsky reaction, first observed in the 1960s. Despite these accomplishments, Turing was never fully recognised in Britain during his lifetime because much of his work was covered by the Official Secrets Act.[13]\n\"\"\"\n
"},{"location":"cookbook/chain_of_density/#how-chain-of-density-works","title":"How Chain Of Density works","text":"Chain Of Density starts with asking the model to generate a first long and non-specific summary. Then it asks the model to generate 4 extra summaries by proceeding in the following way:
- Identify 1-3 entities missing in the previous summary;
- Add all entities marked as missing in the previous step, while not dropping entities;
- Make the summary more concise;
The prompt also asks the model to return a list of JSON objects that contain the missing entities and the new summary. This is where structured generation will come in handy :) The paper provides the prompt and an example:
We can now implement the prompt provided in the paper:
import outlines\n\n@outlines.prompt\ndef chain_of_density(article):\n \"\"\"Article: {{ article }}\n\n You will generate increasingly concise, entity-dense summaries of the above Article.\n\n Repeat the following 2 steps 5 times.\n\n Step 1. Identify 1-3 informative Entities (\"; \" delimited) from the Article which are missing from the previously generated summary.\n Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities.\n\n A Missing Entity is:\n - Relevant: to the main story.\n - Specific: descriptive yet concise (5 words or fewer).\n - Novel: not in the previous summary.\n - Faithful: present in the Article.\n - Anywhere: located anywhere in the Article.\n\n Guidelines:\n - The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., \"this article discusses\") to reach ~80 words.\n - Make every word count: rewrite the previous summary to improve flow and make space for additional entities.\n - Make space with fusion, compression, and removal of uninformative phrases like \"the article discusses\".\n - The summaries should become highly dense and concise yet self-contained, e.g., easily understood without the Article.\n - Missing entities can appear anywhere in the new summary.\n - Never drop entities from the previous summary. If space cannot be made, add fewer new entities.\n\n Remember, use the exact same number of words for each summary.\n\n Answer in JSON. The JSON should be a a dictionary with key \"summaries\" that contains a list (length 5) of dictionaries whose keys are \"Missing_Entities\" and \"Denser_Summary\".\n \"\"\"\n
Note Note that we modified the prompt slightly so it returns a JSON object that contains the summaries, instead of a list of summaries.
"},{"location":"cookbook/chain_of_density/#outlines-implementation","title":"Outlines implementation","text":"We will use Outline's JSON-structured generation to ensure that the model's output is consistent with the format specified in the prompt. We start with defining the JSON objects that the model is asked to return using Pydantic. One JSON object that contains a list of Summary
objects that contain the missing entities and new summary:
from pydantic import BaseModel, conlist\n\nclass Summary(BaseModel):\n missing_entities: str\n denser_summary: str\n\nclass Summaries(BaseModel):\n summaries: conlist(Summary, max_length=5, min_length=5)\n
We now generate the prompt by passing the article we want to summarize to the template. We load a quantized version of Mistral-7B using the AutoAWQ library, and then use JSON-structured generation to generate the summaries:
model = outlines.models.transformers(\"TheBloke/Mistral-7B-OpenOrca-AWQ\")\n\nprompt = chain_of_density(article)\nresult = outlines.generate.json(model, Summaries)(prompt)\n
We can now check the results:
print(result.model_dump())\n# {'summaries': [\n# {\n# 'missing_entities': 'English mathematician, cryptanalyst, philosopher',\n# 'denser_summary': 'Alan Mathison Turing was an English mathematician, cryptanalyst, philosopher.'\n# },\n# {\n# 'missing_entities': '',\n# 'denser_summary': \"Alan Mathison Turing was an English mathematician who was a crucial figure in WW2's Bletchley Park codebreaking centre and designed one of the first computers.\"\n# },\n# {\n# 'missing_entities': 'cryptanalyst, studied, biology, father',\n# 'denser_summary': 'Alan Mathison Turing was an English cryptanalyst, studied theoretical computer science, and contributed to mathematical biology.'\n# },\n# {\n# 'missing_entities': 'biology, morphogenesis, chemical',\n# 'denser_summary': 'Alan Mathison Turing was an English cryptanalyst, studied theoretical computer science, and predicted chemical reactions in morphogenesis.\n# '},\n# {\n# 'missing_entities': '',\n# 'denser_summary': 'Alan Mathison Turing was an English cryptanalyst, developed computer science, and made strides in mathematical biology research.'\n# }\n# ]}\n
Not bad, considering we used a smallish model to generate the summary! Chain of Density seems to be a very effective prompting technique to generate dense summaries, even with small quantized models. Its implementation in Outlines is also very short.
Note that this is the first article I tried and it worked out of the box. Try it out on other articles, and please share the results on Twitter, or by opening a new discussion on the Outlines repository!
"},{"location":"cookbook/chain_of_thought/","title":"Chain of thought","text":"Chain of thought is a prompting technique introduced in the paper \"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models\" where throught prompting the authors generate a series of intermediate reasoning steps which improves the ability of LLMs to perform complex reasoning.
In this guide, we use outlines to apply chain of thought through structured output.
We use llama.cpp using the llama-cpp-python library. Outlines supports llama-cpp-python, but we need to install it ourselves:
pip install llama-cpp-python\n
We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
import llama_cpp\nfrom outlines import generate, models\n\nmodel = models.llamacpp(\"NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF\",\n \"Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False)\n
(Optional) Store the model weights in a custom folder By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model Hermes-2-Pro-Llama-3-8B by NousResearch from HuggingFace:
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\n
We initialize the model:
import llama_cpp\nfrom llama_cpp import Llama\nfrom outlines import generate, models\n\nllm = Llama(\n \"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False\n)\n
"},{"location":"cookbook/chain_of_thought/#chain-of-thought_1","title":"Chain of thought","text":"We first define our Pydantic class for a reasoning step:
from pydantic import BaseModel, Field\n\nclass Reasoning_Step(BaseModel):\n reasoning_step: str = Field(..., description=\"Reasoning step\")\n
We then define the Pydantic class for reasoning which will consist on a list of reasoning steps and a conclusion, and we get its JSON schema:
from typing import List\n\nclass Reasoning(BaseModel):\n reasoning: List[Reasoning_Step] = Field(..., description=\"List of reasoning steps\")\n conclusion: str = Field(..., description=\"Conclusion\")\n\njson_schema = Reasoning.model_json_schema()\n
We could generate a response using the json schema but for a change we will use the regex:
from outlines.integrations.utils import convert_json_schema_to_str\nfrom outlines.fsm.json_schema import build_regex_from_schema\n\nschema_str = convert_json_schema_to_str(json_schema=json_schema)\nregex_str = build_regex_from_schema(schema_str)\n
We then need to adapt our prompt to the Hermes prompt format for JSON schema:
def generate_hermes_prompt(user_prompt):\n return (\n \"<|im_start|>system\\n\"\n \"You are a world class AI model who answers questions in JSON \"\n f\"Here's the json schema you must adhere to:\\n<schema>\\n{json_schema}\\n</schema><|im_end|>\\n\"\n \"<|im_start|>user\\n\"\n + user_prompt\n + \"<|im_end|>\"\n + \"\\n<|im_start|>assistant\\n\"\n \"<schema>\"\n )\n
For a given user prompt:
user_prompt = \"9.11 and 9.9 -- which is bigger?\"\n
we can use generate.regex
by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt:
generator = generate.regex(model, regex_str)\nprompt = generate_hermes_prompt(user_prompt)\nresponse = generator(prompt, max_tokens=1024, temperature=0, seed=42)\n
We obtain a series of intermediate reasoning steps as well as the conclusion:
import json\n\njson_response = json.loads(response)\n\nprint(json_response[\"reasoning\"])\nprint(json_response[\"conclusion\"])\n# [{'reasoning_step': 'Both 9.11 and 9.9 are decimal numbers.'},\n# {'reasoning_step': 'When comparing decimal numbers, we look at the numbers after the decimal point.'},\n# {'reasoning_step': 'In this case, 9.11 has the number 1 after the decimal point, while 9.9 has the number 9.'},\n# {'reasoning_step': 'Since 1 is greater than 9, 9.11 is greater than 9.9.'}]\n# '9.11 is bigger.'\n
We notice that the 4th reasoning step is wrong ``Since 1 is greater than 9, 9.11 is greater than 9.9.'', so we should probably give the model some examples for this particular task.
This example was originally contributed by Alonso Silva.
"},{"location":"cookbook/classification/","title":"Classification","text":"Classification is a classic problem in NLP and finds many applications: spam detection, sentiment analysis, triaging of incoming requests, etc. We will use the example of a company that wants to sort support requests between those that require immediate attention (URGENT
), those that can wait a little (STANDARD
). You could easily extend the example by adding new labels.
This tutorial shows how one can implement multi-label classification using Outlines. We will use two functionalities of the library: generate.choice
and generate.json
.
As always, we start with initializing the model. Since we are GPU poor we will be using a quantized version of Mistal-7B-v0.1:
import outlines\n\nmodel = outlines.models.transformers(\"TheBloke/Mistral-7B-OpenOrca-AWQ\", device=\"cuda\")\n
We will use the following prompt template:
@outlines.prompt\ndef customer_support(request):\n \"\"\"You are an experienced customer success manager.\n\n Given a request from a client, you need to determine when the\n request is urgent using the label \"URGENT\" or when it can wait\n a little with the label \"STANDARD\".\n\n # Examples\n\n Request: \"How are you?\"\n Label: STANDARD\n\n Request: \"I need this fixed immediately!\"\n Label: URGENT\n\n # TASK\n\n Request: {{ request }}\n Label: \"\"\"\n
"},{"location":"cookbook/classification/#choosing-between-multiple-choices","title":"Choosing between multiple choices","text":"Outlines provides a shortcut to do multi-label classification, using the outlines.generate.choice
function to initialize a generator. Outlines uses multinomial sampling by default, here we will use the greedy sampler to get the label with the highest probability:
from outlines.samplers import greedy\n\ngenerator = outlines.generate.choice(model, [\"URGENT\", \"STANDARD\"], sampler=greedy())\n
Outlines supports batched requests, so we will pass two requests to the model: requests = [\n \"My hair is one fire! Please help me!!!\",\n \"Just wanted to say hi\"\n]\n\nprompts = [customer_support(request) for request in requests]\n
We can now asks the model to classify the requests:
labels = generator(prompts)\nprint(labels)\n# ['URGENT', 'STANDARD']\n
Now, you might be in a hurry and don't want to wait until the model finishes completion. After all, you only need to see the first letter of the response to know whether the request is urgent or standard. You can instead stream the response:
tokens = generator.stream(prompts)\nlabels = [\"URGENT\" if \"U\" in token else \"STANDARD\" for token in next(tokens)]\nprint(labels)\n# ['URGENT', 'STANDARD']\n
"},{"location":"cookbook/classification/#using-json-structured-generation","title":"Using JSON-structured generation","text":"Another (convoluted) way to do multi-label classification is to JSON-structured generation in Outlines. We first need to define our Pydantic schema that contains the labels:
from enum import Enum\nfrom pydantic import BaseModel\n\n\nclass Label(str, Enum):\n urgent = \"URGENT\"\n standard = \"STANDARD\"\n\n\nclass Classification(BaseModel):\n label: Label\n
and we can use generate.json
by passing this Pydantic model we just defined, and call the generator:
generator = outlines.generate.json(model, Classification, sampler=greedy())\nlabels = generator(prompts)\nprint(labels)\n# [Classification(label=<Label.urgent: 'URGENT'>), Classification(label=<Label.standard: 'STANDARD'>)]\n
"},{"location":"cookbook/dating_profiles/","title":"Generate a synthetic dating profile from a description","text":"In this example we will see how we can use Outlines to generate synthetic data for a dating application. This example was originally contributed by Vibhor Kumar.
from dataclasses import dataclass\nfrom enum import Enum\n\nimport torch\nimport transformers\nfrom pydantic import BaseModel, conlist, constr\n\nimport outlines\n
"},{"location":"cookbook/dating_profiles/#defining-the-profile-with-pydantic","title":"Defining the profile with Pydantic","text":"Here a dating profile will consist in a biography, a job, a list of interests and two question-answer pairs. The questions are written in advance by the team, and the users are asked to provide an answer:
class QuestionChoice(str, Enum):\n A = \"The key to my heart is\"\n B = \"The first item on my bucket list is\"\n C = \"Perks of dating me\"\n D = \"Message me if you also love\"\n E = \"People would describe me as\"\n F = \"I can beat you in a game of\"\n\n@dataclass\nclass QuestionAnswer:\n question: QuestionChoice\n answer: str\n
Users need to provide a short biography, with a minimum of 10 and a maximum of 300 characters. The application also limits job descriptions to 50 characters. In addition to the question-answer pairs, the user is required to provide a list of between 1 and 5 interests:
class DatingProfile(BaseModel):\n bio: constr(str, min_length=10, max_length=300)\n job: constr(str, max_lengt=50)\n interests: conlist(str, min_length=1, max_length=5) # type: ignore\n qna1: QuestionAnswer\n qna2: QuestionAnswer\n
"},{"location":"cookbook/dating_profiles/#prompt-template-and-examples","title":"Prompt template and examples","text":"We will ask the model to generate profiles from a high-level description:
@dataclass\nclass Example:\n description: str\n profile: DatingProfile\n
We will use Outlines' prompt templating abilities to generate the prompt for us. This help clearly separate the general prompting logic from what is specific to an example.
@outlines.prompt\ndef dating_profile_prompt(description: str, examples: list[Example]):\n \"\"\"\n You are a world-renowned matchmaker who understands the modern dating\n market. Your job is to generate dating app profiles for male clients\n interested in women based on a provided description. The profiles should be\n authentic, show off their strengths, and maximize their likelihood of\n getting matches on dating apps. Here are some examples of past clients that\n you have successfully created profiles for:\n\n {% for example in examples %}\n Description:\n {{ example.description }}\n Profile:\n {{ example.profile }}\n {% endfor %}\n\n Here is the new client who you need to create a profile for:\n Description: {{ description }}\n Profile:\n \"\"\"\n
We will provide the model with several few-shot examples:
samples: list[Example] = [\n Example(\n description=\"I'm an author and former professional soccer player living in Seattle who publishes popular fiction books. A typical day for me starts by hanging out with my cat, drinking a coffee, and reading as much as I can in a few hours. Then, I'll prepare a quick smoothie before starting to write for a few hours, take a break with soccer or running a few miles, and finally meet friends for dinner at a new, hip restaurant in the evening. Sometimes we go axe-throwing afterwards, or play poker, or watch a comedy show, or visit a dive bar. On my vacations, I travel extensively to countries South America, Europe, and Asia, with the goal of visiting them all!\",\n profile=DatingProfile(\n bio=\"Adventurer, dreamer, author, and soccer enthusiast. Life\u2019s too short to waste time so I make the most of each day by exploring new places and playing with my friends on the pitch. What\u2019s your favorite way to get out and have fun?\",\n job=\"Famous Soccer Player -> Famous Author\",\n interests=[\"Soccer\", \"Travel\", \"Friends\", \"Books\", \"Fluffy Animals\"],\n qna1=QuestionAnswer(\n question=QuestionChoice.B, answer=\"swim in all seven oceans!\"\n ),\n qna2=QuestionAnswer(\n question=QuestionChoice.E,\n answer=\"fun-loving, adventurous, and a little bit crazy\",\n ),\n ),\n ),\n Example(\n description=\"I run my company and build houses for a living. I'm a big fan of the outdoors and love to go hiking, camping, and fishing. I don't like video games, but do like to watch movies. My love language is home-cooked food, and I'm looking for someone who isn't afraid to get their hands dirty.\",\n profile=DatingProfile(\n bio=\"If you're looking for a Montana man who loves to get outdoors and hunt, and who's in-tune with his masculinity then I'm your guy!\",\n job=\"House Construction Manager / Entrepreneur\",\n interests=[\"Hunting\", \"Hiking\", \"The outdoors\", \"Home-cooked food\"],\n qna1=QuestionAnswer(question=QuestionChoice.A, answer=\"food made at home\"),\n qna2=QuestionAnswer(\n question=QuestionChoice.C,\n answer=\"having a man in your life who can fix anything\",\n ),\n ),\n ),\n Example(\n description=\"I run my own Youtube channel with 10M subscribers. I love working with kids, and my audience skews pretty young too. In my free time, I play Fortnite and Roblox. I'm looking for someone who is also a gamer and likes to have fun. I'm learning Japanese in my free time as well as how to cook.\",\n profile=DatingProfile(\n bio=\"Easy on the eyes (find me on Youtube!) and great with kids. What more do you need?\",\n job=\"Youtuber 10M+ subscribers\",\n interests=[\"Kids\", \"Gaming\", \"Japanese\"],\n qna1=QuestionAnswer(question=QuestionChoice.D, answer=\"anime and gaming!\"),\n qna2=QuestionAnswer(question=QuestionChoice.F, answer=\"Fortnite, gg ez\"),\n ),\n ),\n]\n
"},{"location":"cookbook/dating_profiles/#load-the-model","title":"Load the model","text":"We will use Mosaic's MPT-7B model (requires 13GB of GPU memory) which can fit on a single GPU with a reasonable context window. We initialize it with Outlines:
config = transformers.AutoConfig.from_pretrained(\n \"mosaicml/mpt-7b-8k-instruct\", trust_remote_code=True\n)\nconfig.init_device = \"meta\"\nmodel = outlines.models.transformers(\n model_name=\"mosaicml/mpt-7b-8k-instruct\",\n device=\"cuda\",\n model_kwargs={\n \"config\": config,\n \"trust_remote_code\": True,\n \"torch_dtype\": torch.bfloat16,\n \"device_map\": {\"\": 0},\n },\n)\n
"},{"location":"cookbook/dating_profiles/#json-structured-generation-of-profiles","title":"JSON-structured generation of profiles","text":"We will now generate a dating profile from a textual description of oneself:
new_description = \"\"\"I'm a laid-back lawyer who spends a lot of his free-time\ngaming. I work in a corporate office, but ended up here after the start-up I\ncofounded got acquired, so still play ping pong with my cool coworkers every\nday. I have a bar at home where I make cocktails, which is great for\nentertaining friends. I secretly like to wear suits and get a new one tailored\nevery few months. I also like weddings because I get to wear those suits, and\nit's a good excuse for a date. I watch the latest series because I'm paying,\nwith my hard-earned money, for every streaming service.\"\"\"\n\nprompt = dating_profile_prompt(new_description, samples)\nprofile = outlines.generate.json(model, DatingProfile)(prompt)\nparsed_profile = DatingProfile.model_validate_json(profile)\n
"},{"location":"cookbook/dating_profiles/#results","title":"Results","text":"Here are a couple of results:
{\n \"bio\": \"\"\"I'm an ambitious lawyer with a casual and fashionable style. I love\n games and sports, but my true passion is preparing refreshing cocktails at\n home and dressing to the nines at weddings. I'm currently looking for a woman\n to show a good time to and get a kiss on the opulent suit I just had made.\n Send resume to this inbox.\"\"\",\n \"job\": \"Lawyer\",\n \"interests\":\n [\n \"Stylish guys\",\n \"Gaming\",\n \"Ping pong\",\n \"Cocktails\",\n \"Weddings\"\n ],\n \"qna1\":\n {\n \"question\": \"The first item on my bucket list is\",\n \"answer\": \"be married and have a family.\"\n },\n \"qna2\":\n {\n \"question\": \"People would describe me as\",\n \"answer\": \"charming, stylish, and funny.\"\n }\n}\n
{\n \"bio\": \"\"\"I\u2019m a sexy lawyer with time on my hands. I love to game and\n play ping pong, but the real reason you should swipe to the right\n is because I look great in a suit. Who doesn\u2019t love a man in a\n suit? Just saying. Send me a message if you think it\u2019s time to take\n your dating life to the next level.\"\"\",\n \"job\": \"Lawyer\",\n \"interests\":\n [\n \"Gaming\",\n \"Ping Pong\",\n \"Tailored Suits\",\n \"Weddings\",\n \"Streaming Services\"\n ],\n \"qna1\":\n {\n \"question\": \"The first item on my bucket list is\",\n \"answer\": \"simulate space but stay alive for as long as possible\"\n },\n \"qna2\":\n {\n \"question\": \"People would describe me as\",\n \"answer\": \"easy-going, a little nerdy but with a mature essence\"\n }\n}\n
"},{"location":"cookbook/deploy-using-bentoml/","title":"Run Outlines using BentoML","text":"BentoML is an open-source model serving library for building performant and scalable AI applications with Python. It comes with tools that you need for serving optimization, model packaging, and production deployment.
In this guide, we will show you how to use BentoML to run programs written with Outlines on GPU locally and in BentoCloud, an AI Inference Platform for enterprise AI teams. The example source code in this guide is also available in the examples/bentoml/ directory.
"},{"location":"cookbook/deploy-using-bentoml/#import-a-model","title":"Import a model","text":"First we need to download an LLM (Mistral-7B-v0.1 in this example and you can use any other LLM) and import the model into BentoML's Model Store. Let's install BentoML and other dependencies from PyPi (preferably in a virtual environment):
pip install -r requirements.txt\n
Then save the code snippet below as import_model.py
and run python import_model.py
.
Note: You need to accept related conditions on Hugging Face first to gain access to Mistral-7B-v0.1.
import bentoml\n\nMODEL_ID = \"mistralai/Mistral-7B-v0.1\"\nBENTO_MODEL_TAG = MODEL_ID.lower().replace(\"/\", \"--\")\n\ndef import_model(model_id, bento_model_tag):\n\n import torch\n from transformers import AutoModelForCausalLM, AutoTokenizer\n\n tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)\n model = AutoModelForCausalLM.from_pretrained(\n MODEL_ID,\n torch_dtype=torch.float16,\n low_cpu_mem_usage=True,\n )\n\n with bentoml.models.create(bento_model_tag) as bento_model_ref:\n tokenizer.save_pretrained(bento_model_ref.path)\n model.save_pretrained(bento_model_ref.path)\n\n\nif __name__ == \"__main__\":\n import_model(MODEL_ID, BENTO_MODEL_TAG)\n
You can verify the download is successful by running:
$ bentoml models list\n\nTag Module Size Creation Time\nmistralai--mistral-7b-v0.1:m7lmf5ac2cmubnnz 13.49 GiB 2024-04-25 06:52:39\n
"},{"location":"cookbook/deploy-using-bentoml/#define-a-bentoml-service","title":"Define a BentoML Service","text":"As the model is ready, we can define a BentoML Service to wrap the capabilities of the model.
We will run the JSON-structured generation example in the README, with the following schema:
DEFAULT_SCHEMA = \"\"\"{\n \"title\": \"Character\",\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\n \"title\": \"Name\",\n \"maxLength\": 10,\n \"type\": \"string\"\n },\n \"age\": {\n \"title\": \"Age\",\n \"type\": \"integer\"\n },\n \"armor\": {\"$ref\": \"#/definitions/Armor\"},\n \"weapon\": {\"$ref\": \"#/definitions/Weapon\"},\n \"strength\": {\n \"title\": \"Strength\",\n \"type\": \"integer\"\n }\n },\n \"required\": [\"name\", \"age\", \"armor\", \"weapon\", \"strength\"],\n \"definitions\": {\n \"Armor\": {\n \"title\": \"Armor\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"leather\", \"chainmail\", \"plate\"],\n \"type\": \"string\"\n },\n \"Weapon\": {\n \"title\": \"Weapon\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"sword\", \"axe\", \"mace\", \"spear\", \"bow\", \"crossbow\"],\n \"type\": \"string\"\n }\n }\n}\"\"\"\n
First, we need to define a BentoML service by decorating an ordinary class (Outlines
here) with @bentoml.service
decorator. We pass to this decorator some configuration and GPU on which we want this service to run in BentoCloud (here an L4 with 24GB memory):
import typing as t\nimport bentoml\n\nfrom import_model import BENTO_MODEL_TAG\n\n@bentoml.service(\n traffic={\n \"timeout\": 300,\n },\n resources={\n \"gpu\": 1,\n \"gpu_type\": \"nvidia-l4\",\n },\n)\nclass Outlines:\n\n bento_model_ref = bentoml.models.get(BENTO_MODEL_TAG)\n\n def __init__(self) -> None:\n\n import outlines\n import torch\n self.model = outlines.models.transformers(\n self.bento_model_ref.path,\n device=\"cuda\",\n model_kwargs={\"torch_dtype\": torch.float16},\n )\n\n ...\n
We then need to define an HTTP endpoint using @bentoml.api
to decorate the method generate
of Outlines
class:
...\n\n @bentoml.api\n async def generate(\n self,\n prompt: str = \"Give me a character description.\",\n json_schema: t.Optional[str] = DEFAULT_SCHEMA,\n ) -> t.Dict[str, t.Any]:\n\n import outlines\n\n generator = outlines.generate.json(self.model, json_schema)\n character = generator(prompt)\n\n return character\n
Here @bentoml.api
decorator defines generate
as an HTTP endpoint that accepts a JSON request body with two fields: prompt
and json_schema
(optional, which allows HTTP clients to provide their own JSON schema). The type hints in the function signature will be used to validate incoming JSON requests. You can define as many HTTP endpoints as you want by using @bentoml.api
to decorate other methods of Outlines
class.
Now you can save the above code to service.py
(or use this implementation), and run the code using the BentoML CLI.
"},{"location":"cookbook/deploy-using-bentoml/#run-locally-for-testing-and-debugging","title":"Run locally for testing and debugging","text":"Then you can run a server locally by:
bentoml serve .\n
The server is now active at http://localhost:3000. You can interact with it using the Swagger UI or in other different ways:
CURL curl -X 'POST' \\\n 'http://localhost:3000/generate' \\\n -H 'accept: application/json' \\\n -H 'Content-Type: application/json' \\\n -d '{\n \"prompt\": \"Give me a character description.\"\n}'\n
Python client import bentoml\n\nwith bentoml.SyncHTTPClient(\"http://localhost:3000\") as client:\n response = client.generate(\n prompt=\"Give me a character description\"\n )\n print(response)\n
Expected output:
{\n \"name\": \"Aura\",\n \"age\": 15,\n \"armor\": \"plate\",\n \"weapon\": \"sword\",\n \"strength\": 20\n}\n
"},{"location":"cookbook/deploy-using-bentoml/#deploy-to-bentocloud","title":"Deploy to BentoCloud","text":"After the Service is ready, you can deploy it to BentoCloud for better management and scalability. Sign up if you haven't got a BentoCloud account.
Make sure you have logged in to BentoCloud, then run the following command to deploy it.
bentoml deploy .\n
Once the application is up and running on BentoCloud, you can access it via the exposed URL.
Note: For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.
"},{"location":"cookbook/deploy-using-cerebrium/","title":"Run Outlines using Cerebrium","text":"Cerebrium is a serverless AI infrastructure platform that makes it easier for companies to build and deploy AI based applications. They offer Serverless GPU's\u00a0with low cold start times with over 12 varieties of GPU chips that auto scale and you only pay for the compute you use.
In this guide we will show you how you can use Cerebrium to run programs written with Outlines on GPUs in the cloud.
"},{"location":"cookbook/deploy-using-cerebrium/#setup-cerebrium","title":"Setup Cerebrium","text":"First, we install Cerebrium and login to get authenticated.
pip install cerebrium\ncerebrium login\n
Then let us create our first project
cerebrium init outlines-project\n
"},{"location":"cookbook/deploy-using-cerebrium/#setup-environment-and-hardware","title":"Setup Environment and Hardware","text":"You set up your environment and hardware in the cerebrium.toml file that was created using the init function above.
[cerebrium.deployment]\ndocker_base_image_url = \"nvidia/cuda:12.1.1-runtime-ubuntu22.04\"\n\n[cerebrium.hardware]\ncpu = 2\nmemory = 14.0\ngpu = \"AMPERE A10\"\ngpu_count = 1\nprovider = \"aws\"\nregion = \"us-east-1\"\n\n[cerebrium.dependencies.pip]\noutline = \"==0.0.37\"\ntransformers = \"==4.38.2\"\ndatasets = \"==2.18.0\"\naccelerate = \"==0.27.2\"\n
"},{"location":"cookbook/deploy-using-cerebrium/#setup-inference","title":"Setup inference","text":"Running code in Cerebrium is like writing normal python with no special syntax. In a main.py
file specify the following:
import outlines\n\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n\nschema = \"\"\"{\n \"title\": \"Character\",\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\n \"title\": \"Name\",\n \"maxLength\": 10,\n \"type\": \"string\"\n },\n \"age\": {\n \"title\": \"Age\",\n \"type\": \"integer\"\n },\n \"armor\": {\"$ref\": \"#/definitions/Armor\"},\n \"weapon\": {\"$ref\": \"#/definitions/Weapon\"},\n \"strength\": {\n \"title\": \"Strength\",\n \"type\": \"integer\"\n }\n },\n \"required\": [\"name\", \"age\", \"armor\", \"weapon\", \"strength\"],\n \"definitions\": {\n \"Armor\": {\n \"title\": \"Armor\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"leather\", \"chainmail\", \"plate\"],\n \"type\": \"string\"\n },\n \"Weapon\": {\n \"title\": \"Weapon\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"sword\", \"axe\", \"mace\", \"spear\", \"bow\", \"crossbow\"],\n \"type\": \"string\"\n }\n }\n}\"\"\"\n\ngenerator = outlines.generate.json(model, schema)\n
On first deploy, it will download the model and store it on disk therefore for subsequent calls it will load the model from disk.
Every function in Cerebrium is callable through an API endpoint. Code at the top most layer (ie: not in a function) is instantiated only when the container is spun up the first time so for subsequent calls, it will simply run the code defined in the function you call.
To deploy an API that creates a new character when called with a prompt you can add the following code to main.py
:
def generate(\n prompt: str = \"Amiri, a 53 year old warrior woman with a sword and leather armor.\",\n):\n\n character = generator(\n f\"<s>[INST]Give me a character description. Describe {prompt}.[/INST]\"\n )\n\n return character\n
"},{"location":"cookbook/deploy-using-cerebrium/#run-on-the-cloud","title":"Run on the cloud","text":"cerebrium deploy\n
You will see your application deploy, install pip packages and download the model. Once completed it will output a CURL request you can use to call your endpoint. Just remember to end the url with the function you would like to call - in this case /generate. You should see your response returned!
"},{"location":"cookbook/deploy-using-modal/","title":"Run Outlines using Modal","text":"Modal is a serverless platform that allows you to easily run code on the cloud, including GPUs. It can come very handy for those of us who don't have a monster GPU at home and want to be able to quickly and easily provision, configure and orchestrate cloud infrastructure.
In this guide we will show you how you can use Modal to run programs written with Outlines on GPU in the cloud.
"},{"location":"cookbook/deploy-using-modal/#requirements","title":"Requirements","text":"We recommend installing modal
and outlines
in a virtual environment. You can create one with:
python -m venv venv\nsource venv/bin/activate\n
Then install the required packages:
pip install modal outlines\n
"},{"location":"cookbook/deploy-using-modal/#build-the-image","title":"Build the image","text":"First we need to define our container image. If you need to access a gated model, you will need to provide an access token. See the .env
call below for how to provide a HuggingFace token.
Setting a token is best done by setting an environment variable HF_TOKEN
with your token. If you do not wish to do this, we provide a commented-out line in the code to set the token directly in the code.
from modal import Image, App, gpu\nimport os\n\n# This creates a modal App object. Here we set the name to \"outlines-app\".\n# There are other optional parameters like modal secrets, schedules, etc.\n# See the documentation here: https://modal.com/docs/reference/modal.App\napp = App(name=\"outlines-app\")\n\n# Specify a language model to use.\n# Another good model to use is \"NousResearch/Hermes-2-Pro-Mistral-7B\"\nlanguage_model = \"mistral-community/Mistral-7B-v0.2\"\n\n# Please set an environment variable HF_TOKEN with your Hugging Face API token.\n# The code below (the .env({...}) part) will copy the token from your local\n# environment to the container.\n# More info on Image here: https://modal.com/docs/reference/modal.Image\noutlines_image = Image.debian_slim(python_version=\"3.11\").pip_install(\n \"outlines\",\n \"transformers\",\n \"datasets\",\n \"accelerate\",\n \"sentencepiece\",\n).env({\n # This will pull in your HF_TOKEN environment variable if you have one.\n 'HF_TOKEN':os.environ['HF_TOKEN']\n\n # To set the token directly in the code, uncomment the line below and replace\n # 'YOUR_TOKEN' with the HuggingFace access token.\n # 'HF_TOKEN':'YOUR_TOKEN'\n})\n
"},{"location":"cookbook/deploy-using-modal/#setting-the-container-up","title":"Setting the container up","text":"When running longer Modal apps, it's recommended to download your language model when the container starts, rather than when the function is called. This will cache the model for future runs.
# This function imports the model from Hugging Face. The modal container\n# will call this function when it starts up. This is useful for\n# downloading models, setting up environment variables, etc.\ndef import_model():\n import outlines\n outlines.models.transformers(language_model)\n\n# This line tells the container to run the import_model function when it starts.\noutlines_image = outlines_image.run_function(import_model)\n
"},{"location":"cookbook/deploy-using-modal/#define-a-schema","title":"Define a schema","text":"We will run the JSON-structured generation example in the README, with the following schema:
# Specify a schema for the character description. In this case,\n# we want to generate a character with a name, age, armor, weapon, and strength.\nschema = \"\"\"{\n \"title\": \"Character\",\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\n \"title\": \"Name\",\n \"maxLength\": 10,\n \"type\": \"string\"\n },\n \"age\": {\n \"title\": \"Age\",\n \"type\": \"integer\"\n },\n \"armor\": {\"$ref\": \"#/definitions/Armor\"},\n \"weapon\": {\"$ref\": \"#/definitions/Weapon\"},\n \"strength\": {\n \"title\": \"Strength\",\n \"type\": \"integer\"\n }\n },\n \"required\": [\"name\", \"age\", \"armor\", \"weapon\", \"strength\"],\n \"definitions\": {\n \"Armor\": {\n \"title\": \"Armor\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"leather\", \"chainmail\", \"plate\"],\n \"type\": \"string\"\n },\n \"Weapon\": {\n \"title\": \"Weapon\",\n \"description\": \"An enumeration.\",\n \"enum\": [\"sword\", \"axe\", \"mace\", \"spear\", \"bow\", \"crossbow\"],\n \"type\": \"string\"\n }\n }\n}\"\"\"\n
To make the inference work on Modal we need to wrap the corresponding function in a @app.function
decorator. We pass to this decorator the image and GPU on which we want this function to run.
Let's choose an A100 with 80GB memory. Valid GPUs can be found here.
# Define a function that uses the image we chose, and specify the GPU\n# and memory we want to use.\n@app.function(image=outlines_image, gpu=gpu.A100(size='80GB'))\ndef generate(\n prompt: str = \"Amiri, a 53 year old warrior woman with a sword and leather armor.\",\n):\n # Remember, this function is being executed in the container,\n # so we need to import the necessary libraries here. You should\n # do this with any other libraries you might need.\n import outlines\n\n # Load the model into memory. The import_model function above\n # should have already downloaded the model, so this call\n # only loads the model into GPU memory.\n model = outlines.models.transformers(\n language_model, device=\"cuda\"\n )\n\n # Generate a character description based on the prompt.\n # We use the .json generation method -- we provide the\n # - model: the model we loaded above\n # - schema: the JSON schema we defined above\n generator = outlines.generate.json(model, schema)\n\n # Make sure you wrap your prompt in instruction tags ([INST] and [/INST])\n # to indicate that the prompt is an instruction. Instruction tags can vary\n # by models, so make sure to check the model's documentation.\n character = generator(\n f\"<s>[INST]Give me a character description. Describe {prompt}.[/INST]\"\n )\n\n # Print out the generated character.\n print(character)\n
We then need to define a local_entrypoint
to call our function generate
remotely.
@app.local_entrypoint()\ndef main(\n prompt: str = \"Amiri, a 53 year old warrior woman with a sword and leather armor.\",\n):\n # We use the \"generate\" function defined above -- note too that we are calling\n # .remote() on the function. This tells modal to run the function in our cloud\n # machine. If you want to run the function locally, you can call .local() instead,\n # though this will require additional setup.\n generate.remote(prompt)\n
Here @app.local_entrypoint()
decorator defines main
as the function to start from locally when using the Modal CLI. You can save above code to example.py
(or use this implementation). Let's now see how to run the code on the cloud using the Modal CLI.
"},{"location":"cookbook/deploy-using-modal/#run-on-the-cloud","title":"Run on the cloud","text":"First install the Modal client from PyPi, if you have not already:
pip install modal\n
You then need to obtain a token from Modal. Run the following command:
modal setup\n
Once that is set you can run inference on the cloud using:
modal run example.py\n
You should see the Modal app initialize, and soon after see the result of the print
function in your terminal. That's it!
"},{"location":"cookbook/earnings-reports/","title":"Extracting financial data from earnings reports","text":"A common task in finance is to extract financial data from earnings reports. Earnings reports are infamously poorly formatted, as the SEC does not have requirements for producing machine-readable documents.
Earnings reports are often provided as HTML documents, which can be difficult to parse. Investors often use complicated parsing systems or manual review to extract data. Entire companies are built around automating this task.
This cookbook is a proof of concept about how we can use LLMs to extract financial data directly into CSV. Comma-separated values are well-structured and can be defined by a regular expression, which Outlines can use to guide the LLM's output.
The example is a smaller subset of a full demo found here. The demo contains the full set of pre-processing steps needed to convert raw HTML into a structured CSV file, and tests the results across three company's 10k reports.
"},{"location":"cookbook/earnings-reports/#setup","title":"Setup","text":"Install outlines and required dependencies:
# Later versions of torch can have difficulty with certain CUDA drivers.\n# We recommend using 2.4.0 for now, but you may wish to experiment with\n# other versions.\npip install outlines pandas transformers torch==2.4.0 accelerate\n
"},{"location":"cookbook/earnings-reports/#load-the-model","title":"Load the model","text":"Choose your language model. We'll use Phi-3 mini, which is small enough to run on reasonably small machines.
import outlines\nimport torch\n\nmodel_name = 'microsoft/Phi-3-mini-4k-instruct'\nmodel = outlines.models.transformers(\n model_name,\n device='auto',\n model_kwargs={\n # To reduce memory usage, we'll use bfloat16\n \"torch_dtype\": torch.bfloat16,\n },\n)\n
"},{"location":"cookbook/earnings-reports/#set-up-the-data","title":"Set up the data","text":"For brevity, we've attached the markdown version of Nvidia's 10k report. The full demonstration processes the raw HTML version of the report to these markdown tables. Pages are filtered by whether they seem to contain income statements, and then compacted into the string you see below.
income_statement = \"\"\"\nTable of ContentsNVIDIA Corporation and SubsidiariesConsolidated Statements of Income(In millions, except per share data)\n\n| | | | | | | | | | | | | | | | | | |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| | | | Year Ended | | | | | | | | | | | | | | |\n| | | | Jan 28, 2024 | | | | | | Jan 29, 2023 | | | | | | Jan 30, 2022 | | |\n| Revenue | | | $ | 60,922 | | | | | $ | 26,974 | | | | | $ | 26,914 | |\n| Cost of revenue | | | 16,621 | | | | | | 11,618 | | | | | | 9,439 | | |\n| Gross profit | | | 44,301 | | | | | | 15,356 | | | | | | 17,475 | | |\n| Operating expenses | | | | | | | | | | | | | | | | | |\n| Research and development | | | 8,675 | | | | | | 7,339 | | | | | | 5,268 | | |\n| Sales, general and administrative | | | 2,654 | | | | | | 2,440 | | | | | | 2,166 | | |\n| Acquisition termination cost | | | \u0097 | | | | | | 1,353 | | | | | | \u0097 | | |\n| Total operating expenses | | | 11,329 | | | | | | 11,132 | | | | | | 7,434 | | |\n| Operating income | | | 32,972 | | | | | | 4,224 | | | | | | 10,041 | | |\n| Interest income | | | 866 | | | | | | 267 | | | | | | 29 | | |\n| Interest expense | | | (257) | | | | | | (262) | | | | | | (236) | | |\n| Other, net | | | 237 | | | | | | (48) | | | | | | 107 | | |\n| Other income (expense), net | | | 846 | | | | | | (43) | | | | | | (100) | | |\n| Income before income tax | | | 33,818 | | | | | | 4,181 | | | | | | 9,941 | | |\n| Income tax expense (benefit) | | | 4,058 | | | | | | (187) | | | | | | 189 | | |\n| Net income | | | $ | 29,760 | | | | | $ | 4,368 | | | | | $ | 9,752 | |\n| | | | | | | | | | | | | | | | | | |\n| Net income per share: | | | | | | | | | | | | | | | | | |\n| Basic | | | $ | 12\\.05 | | | | | $ | 1\\.76 | | | | | $ | 3\\.91 | |\n| Diluted | | | $ | 11\\.93 | | | | | $ | 1\\.74 | | | | | $ | 3\\.85 | |\n| | | | | | | | | | | | | | | | | | |\n| Weighted average shares used in per share computation: | | | | | | | | | | | | | | | | | |\n| Basic | | | 2,469 | | | | | | 2,487 | | | | | | 2,496 | | |\n| Diluted | | | 2,494 | | | | | | 2,507 | | | | | | 2,535 | | |\n\"\"\"\n
The markdown tables extracted from the earnings reports can vary widely in row names, column counts, data types, etc. The advantage of LLMs here is that we can define the data we want in terms of the data types, and the LLM will output the data in the desired format.
For comparison, here is how the income statement looks in the original HTML:
"},{"location":"cookbook/earnings-reports/#define-the-data-we-want","title":"Define the data we want","text":"Outlines is often used for JSON output, but it can also be used for CSV. We know the columns we want to extract, and we know the data types of the columns. Year for example is always a four-digit number, revenue is a number with commas, and so on.
We can define a regex pattern for each column type:
# Define the column type regex patterns\ncolumn_types = {\n # Year is always a four-digit number\n \"year\": r\"\\d{4}\",\n\n # Revenue, operating income, and net income are always numbers with commas.\n # This regex permits integers that may begin with a minus sign, and may have\n # commas separating the thousands, millions, etc.\n \"integer_comma\": r\"((-?\\d+),?\\d+|(-?\\d+))\",\n # Number is currently not used, but it represents a number with up to two decimal places.\n \"number\": r\"(-?\\d+(?:\\.\\d{1,2})?)\",\n}\n
Next, let's choose the columns we want to extract. We want
- Year, always a four-digit number
- Revenue, a number with commas
- Operating income, a number with commas
- Net income, a number with commas
# Define the columns to extract, and their data types.\ncolumns_to_extract = {\n \"year\": \"year\",\n \"revenue\": \"integer_comma\",\n \"operating_income\": \"integer_comma\",\n \"net_income\": \"integer_comma\",\n}\n
You can modify column_type_regex
to match the data types of the columns you want to extract. Adding a new financial metric to extract is as simple as adding a new key/value pair to columns_to_extract
:
columns_to_extract[\"diluted_earnings_per_share\"] = \"number\"\n
Additional columns are not well tested for accuracy, so use with caution.
"},{"location":"cookbook/earnings-reports/#create-the-regex-describing-the-data-we-want","title":"Create the regex describing the data we want","text":"# Create the header line. This is the requested column names\n# separated by commas, i.e. \"year,revenue,...\"\nheader = \",\".join(columns_to_extract.keys())\n\n# Create the data capture patterns. These are the regex patterns\n# that will be used to capture the data in each column\ndata_patterns = [column_types[dtype] for dtype in columns_to_extract.values()]\ndata_line = \",\".join(data_patterns)\n\n# Our final regex pattern.\nmax_rows = 3 # We expect 3 rows of data, firms usually report 3 years of income statements\ncsv_regex = f\"{header}(\\n{data_line}){{,{max_rows}}}\\n\\n\"\n\nprint(csv_regex)\n
which gives us
year,revenue,operating_income,net_income,basic_earnings_per_share(\n\\d{4},((-?\\d+),?\\d+|(-?\\d+)),((-?\\d+),?\\d+|(-?\\d+)),((-?\\d+),?\\d+|(-?\\d+)),(-?\\d+(?:\\.\\d{1,2})?)){,3}\n
Pretty hairy, right? Thankfully, we have a simple function to construct this regex for you. The regex defines a header line, followed by a data line that repeats for each row of data we want to extract. Passing the regex to outlines.generate.regex
will produce a function that will always produce a CSV string that is consistent with the regex.
"},{"location":"cookbook/earnings-reports/#prompting-the-model","title":"Prompting the model","text":"Outlines does not add system or instruction tokens by default, so we need to use transformers.AutoTokenizer
to add them for whatever model we're using.
from transformers import AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n\ndef add_instruction(prompt):\n return tokenizer.apply_chat_template([{\"role\": \"user\", \"content\": prompt}], tokenize=False, add_generation_prompt=True)\n\nprint(add_instruction(\"Howdy\"))\n
<|user|>\nHowdy<|end|>\n<|assistant|>\n
Our prompt roughly describes the task we want the model to perform, and a few pieces of information it may need to know about income statements.
def extract_financial_data_prompt(columns_to_extract, income_statement):\n user_prompt = f\"\"\"\n Extract annual financial data from this set of pages. Pages\n are from a 10k filing and were chosen because they may contain\n a comprehensive income statement. Note that selected pages may\n be incorrectly extracted, so you should verify that you are extracting\n from the comprehensive income statement and not some other financial\n statement.\n\n Create a row for each year available in the income statement with the\n following columns: {', '.join(columns_to_extract.keys())}. Firms typically report the\n most recent 3 years of data, but this can vary.\n\n Each column has types: {', '.join(columns_to_extract.values())}.\n\n # Relevant pages:\n\n {income_statement}\n\n # Key instructions:\n\n 1. Look ONLY at the \"Consolidated Statements of Income\" table\n 2. For operating income, look for \"Income from operations\" or \"Operating income\"\n 3. For net income, use the TOTAL net income figure, not amounts allocated to specific share classes\n 4. Use NULL for missing values\n 5. Operating income must be less than revenue\n 6. Net income must be less than operating income\n 7. Ignore segment breakdowns, quarterly data, or per-share amounts\n\n # Output format:\n\n - CSV format with headers: {','.join(columns_to_extract.keys())}\n - Use NULL for missing values\n - If no data are found, do not create a row.\n - Enter two newline characters to terminate the CSV when no more data are found.\n\n # Definitions:\n - Revenue: Total sales of goods and services. Usually this is at the top of the\n income statement.\n - Operating income: Revenue minus operating expenses for the entire company. This is revenue\n minus costs. Operating income is also called operating profit, EBIT, or income from\n operations.\n - Net income: Operating income minus taxes. This is the bottom line of the\n income statement.\n \"\"\"\n\n return add_instruction(user_prompt)\n
"},{"location":"cookbook/earnings-reports/#running-the-model","title":"Running the model","text":"Now that we have our prompt and regular expression, we can run the model.
Construct our regex extractor function. We'll use a greedy sampler, which samples the most likely next token at each step. It's a simple sampler that is more reproducible than multinomial sampling.
csv_extractor = outlines.generate.regex(\n model, csv_regex, sampler=outlines.samplers.greedy()\n)\n
Provide the prompt to the model and run it:
csv_data = csv_extractor(\n extract_financial_data_prompt(columns_to_extract, income_statement),\n max_tokens=1024,\n)\n\nprint(csv_data)\n
year,revenue,operating_income,net_income\n2024,60922,32972,29760\n2023,26974,4224,4368\n2022,26914,10041,9752\n
Voila! We've extracted the financial data from the income statement, and it's correct upon inspection.
You can even load this into a pandas
DataFrame for further analysis:
import pandas as pd\nfrom io import StringIO\n\ndf = pd.read_csv(StringIO(csv_data))\nprint(df)\n
year revenue operating_income net_income\n0 2024 60922 32972 29760\n1 2023 26974 4224 4368\n2 2022 26914 10041 9752\n
"},{"location":"cookbook/extract_event_details/","title":"Extract events details from text","text":"This recipe demonstrates how to use the outlines
library to extract structured event details from a text message. We will extract the title, location, and start date and time from messages like the following:
Hello Kitty, my grandmother will be here, I think it's better to postpone\nour appointment to review math lessons to next Monday at 2pm at the same\nplace, 3 avenue des tanneurs, one hour will be enough see you \ud83d\ude18\n
Let see how to extract the event details from the message with the MLX library dedicated to Apple Silicon processor (M series).
from datetime import datetime\n\nfrom pydantic import BaseModel, Field\n\nfrom outlines import generate, models\n\n# Load the model\nmodel = models.mlxlm(\"mlx-community/Hermes-3-Llama-3.1-8B-8bit\")\n\n\n# Define the event schema using Pydantic\nclass Event(BaseModel):\n title: str = Field(description=\"title of the event\")\n location: str\n start: datetime = Field(\n default=None, description=\"date of the event if available in iso format\"\n )\n\n\n# Get the current date and time\nnow = datetime.now().strftime(\"%A %d %B %Y and it's %H:%M\")\n\n# Define the prompt\nprompt = f\"\"\"\nToday's date and time are {now}\nGiven a user message, extract information of the event like date and time in iso format, location and title.\nIf the given date is relative, think step by step to find the right date.\nHere is the message:\n\"\"\"\n\n# Sample message\nmessage = \"\"\"Hello Kitty, my grandmother will be here , I think it's better to postpone our\nappointment to review math lessons to next Friday at 2pm at the same place, 3 avenue des tanneurs, I think that one hour will be enough\nsee you \ud83d\ude18 \"\"\"\n\n# Create the generator\ngenerator = generate.json(model, Event)\n\n# Extract the event information\nevent = generator(prompt + message)\n\n# Print the current date and time\nprint(f\"Today: {now}\")\n\n# Print the extracted event information in JSON format\nprint(event.json())\n
The output will be:
Today: Saturday 16 November 2024 and it's 10:55\n
and the extracted event information will be:
{\n \"title\":\"Math Review\",\n \"location\":\"3 avenue des tanneurs\",\n \"start\":\"2024-11-22T14:00:00Z\"\n}\n
To find out more about this use case, we recommend the project developped by Joseph Rudoler the ICS Generator
"},{"location":"cookbook/extraction/","title":"Named entity extraction","text":"Named Entity Extraction is a fundamental problem in NLP. It involves identifying and categorizing named entities within a document: people, organization, dates, places, etc. It is usually the first step in a more complex NLP worklow. Here we will use the example of a pizza restaurant that receives orders via their website and need to identify the number and types of pizzas that are being ordered.
Getting LLMs to output the extracted entities in a structured format can be challenging. In this tutorial we will see how we can use Outlines' JSON-structured generation to extract entities from a document and return them in a valid JSON data structure 100% of the time.
As always, we start with initializing the model. We will be using a quantized version of Mistal-7B-v0.1 (we're GPU poor):
import outlines\n\nmodel = outlines.models.transformers(\"TheBloke/Mistral-7B-OpenOrca-AWQ\", device=\"cuda\")\n
And we will be using the following prompt template:
@outlines.prompt\ndef take_order(order):\n \"\"\"You are the owner of a pizza parlor. Customers \\\n send you orders from which you need to extract:\n\n 1. The pizza that is ordered\n 2. The number of pizzas\n\n # EXAMPLE\n\n ORDER: I would like one Margherita pizza\n RESULT: {\"pizza\": \"Margherita\", \"number\": 1}\n\n # OUTPUT INSTRUCTIONS\n\n Answer in valid JSON. Here are the different objects relevant for the output:\n\n Order:\n pizza (str): name of the pizza\n number (int): number of pizzas\n\n Return a valid JSON of type \"Order\"\n\n # OUTPUT\n\n ORDER: {{ order }}\n RESULT: \"\"\"\n
We now define our data model using Pydantic:
from enum import Enum\nfrom pydantic import BaseModel\n\nclass Pizza(str, Enum):\n margherita = \"Margherita\"\n pepperonni = \"Pepperoni\"\n calzone = \"Calzone\"\n\nclass Order(BaseModel):\n pizza: Pizza\n number: int\n
We can now define our generator and call it on several incoming orders:
orders = [\n \"Hi! I would like to order two pepperonni pizzas and would like them in 30mins.\",\n \"Is it possible to get 12 margheritas?\"\n]\nprompts = [take_order(order) for order in orders]\n\ngenerator = outlines.generate.json(model, Order)\n\nresults = generator(prompts)\nprint(results)\n# [Order(pizza=<Pizza.pepperonni: 'Pepperoni'>, number=2),\n# Order(pizza=<Pizza.margherita: 'Margherita'>, number=12)]\n
There are several ways you could improve this example:
- Clients may order several types of pizzas.
- Clients may order drinks as well.
- If the pizza place has a delivery service we need to extract the client's address and phone number
- Clients may specify the time for which they want the pizza. We could then check against a queuing system and reply to them with the estimated delivery time.
How would you change the Pydantic model to account for these use cases?
"},{"location":"cookbook/knowledge_graph_extraction/","title":"Knowledge Graph Extraction","text":"In this guide, we use outlines to extract a knowledge graph from unstructured text.
We will use llama.cpp using the llama-cpp-python library. Outlines supports llama-cpp-python, but we need to install it ourselves:
pip install llama-cpp-python\n
We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
import llama_cpp\nfrom outlines import generate, models\n\nmodel = models.llamacpp(\"NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF\",\n \"Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False)\n
(Optional) Store the model weights in a custom folder By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model Hermes-2-Pro-Llama-3-8B by NousResearch from HuggingFace:
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\n
We initialize the model:
import llama_cpp\nfrom llama_cpp import Llama\nfrom outlines import generate, models\n\nllm = Llama(\n \"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False\n)\n
"},{"location":"cookbook/knowledge_graph_extraction/#knowledge-graph-extraction_1","title":"Knowledge Graph Extraction","text":"We first need to define our Pydantic class for each node and each edge of the knowledge graph:
from pydantic import BaseModel, Field\n\nclass Node(BaseModel):\n \"\"\"Node of the Knowledge Graph\"\"\"\n\n id: int = Field(..., description=\"Unique identifier of the node\")\n label: str = Field(..., description=\"Label of the node\")\n property: str = Field(..., description=\"Property of the node\")\n\n\nclass Edge(BaseModel):\n \"\"\"Edge of the Knowledge Graph\"\"\"\n\n source: int = Field(..., description=\"Unique source of the edge\")\n target: int = Field(..., description=\"Unique target of the edge\")\n label: str = Field(..., description=\"Label of the edge\")\n property: str = Field(..., description=\"Property of the edge\")\n
We then define the Pydantic class for the knowledge graph and get its JSON schema:
from typing import List\n\nclass KnowledgeGraph(BaseModel):\n \"\"\"Generated Knowledge Graph\"\"\"\n\n nodes: List[Node] = Field(..., description=\"List of nodes of the knowledge graph\")\n edges: List[Edge] = Field(..., description=\"List of edges of the knowledge graph\")\n\nschema = KnowledgeGraph.model_json_schema()\n
We then need to adapt our prompt to the Hermes prompt format for JSON schema:
def generate_hermes_prompt(user_prompt):\n return (\n \"<|im_start|>system\\n\"\n \"You are a world class AI model who answers questions in JSON \"\n f\"Here's the json schema you must adhere to:\\n<schema>\\n{schema}\\n</schema><|im_end|>\\n\"\n \"<|im_start|>user\\n\"\n + user_prompt\n + \"<|im_end|>\"\n + \"\\n<|im_start|>assistant\\n\"\n \"<schema>\"\n )\n
For a given user prompt, for example:
user_prompt = \"Alice loves Bob and she hates Charlie.\"\n
We can use generate.json
by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt:
from outlines import generate, models\n\nmodel = models.LlamaCpp(llm)\ngenerator = generate.json(model, KnowledgeGraph)\nprompt = generate_hermes_prompt(user_prompt)\nresponse = generator(prompt, max_tokens=1024, temperature=0, seed=42)\n
We obtain the nodes and edges of the knowledge graph:
print(response.nodes)\nprint(response.edges)\n# [Node(id=1, label='Alice', property='Person'),\n# Node(id=2, label='Bob', property='Person'),\n# Node(id=3, label='Charlie', property='Person')]\n# [Edge(source=1, target=2, label='love', property='Relationship'),\n# Edge(source=1, target=3, label='hate', property='Relationship')]\n
"},{"location":"cookbook/knowledge_graph_extraction/#optional-visualizing-the-knowledge-graph","title":"(Optional) Visualizing the Knowledge Graph","text":"We can use the Graphviz library to visualize the generated knowledge graph. For detailed installation instructions, see here.
from graphviz import Digraph\n\ndot = Digraph()\nfor node in response.nodes:\n dot.node(str(node.id), node.label, shape='circle', width='1', height='1')\nfor edge in response.edges:\n dot.edge(str(edge.source), str(edge.target), label=edge.label)\n\ndot.render('knowledge-graph.gv', view=True)\n
This example was originally contributed by Alonso Silva.
"},{"location":"cookbook/models_playing_chess/","title":"Large language models playing chess","text":"In this example we will make a Phi-2 model play chess against itself. On its own the model easily generates invalid moves, so we will give it a little help. At each step we will generate a regex that only matches valid move, and use it to help the model only generating valid moves.
"},{"location":"cookbook/models_playing_chess/#the-chessboard","title":"The chessboard","text":"The game will be played on a standard checkboard. We will use the chess
library to track the opponents' moves, and check that the moves are valid.
%pip install outlines -q\n%pip install chess -q\n%pip install transformers accelerate einops -q\n\nimport chess\n\nboard = chess.Board(\"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1\")\n
"},{"location":"cookbook/models_playing_chess/#the-opponents","title":"The opponents","text":"Phi-2 will be playing against itself:
from outlines import models\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n
"},{"location":"cookbook/models_playing_chess/#a-little-help-for-the-language-model","title":"A little help for the language model","text":"To make sure Phi-2 generates valid chess moves we will use Outline's regex-structured generation. We define a function that takes the current state of the board and returns a regex that matches all possible legal moves:
import re\n\ndef legal_moves_regex(board):\n \"\"\"Build a regex that only matches valid moves.\"\"\"\n legal_moves = list(board.legal_moves)\n legal_modes_str = [board.san(move) for move in legal_moves]\n legal_modes_str = [re.sub(r\"[+#]\", \"\", move) for move in legal_modes_str]\n regex_pattern = \"|\".join(re.escape(move) for move in legal_modes_str)\n regex_pattern = f\"{regex_pattern}\"\n return regex_pattern\n
"},{"location":"cookbook/models_playing_chess/#prompting-the-language-model","title":"Prompting the language model","text":"The prompt corresponds to the current state of the board, so we start with:
prompt = \"Let's play Chess. Moves: \"\n
We update the prompt at each step so it reflects the state of the board after the previous move.
"},{"location":"cookbook/models_playing_chess/#lets-play","title":"Let's play","text":"from outlines import generate\n\nboard_state = \" \"\nturn_number = 0\nwhile not board.is_game_over():\n regex_pattern = legal_moves_regex(board)\n structured = generate.regex(model, regex_pattern)(prompt + board_state)\n move = board.parse_san(structured)\n\n if turn_number % 2 == 0 : # It's White's turn\n board_state += board.san(move) + \" \"\n else:\n board_state += board.san(move) + \" \" + str(turn_number) + \".\"\n\n turn_number += 1\n\n board.push(move)\n\n print(board_state)\n
Interestingly enough, Phi-2 hates capturing.
e4 e5 1.Nf3 Ne7 3.b4 Nf5 5.Nc3 Ne7 7.Bb5 a6 9.Na4 b6 11.c3 Nec6 13.c4 a5 15.d4 Qg5 17.Nd2 Bb7 19.dxe5\n
This example was originally authored by @903124S in this gist.
"},{"location":"cookbook/qa-with-citations/","title":"Generate Synthetic Data and Q&A with Citations","text":"This tutorial is adapted from the instructor-ollama notebook. We start with a simple example to generate synthetic data and then we approach the problem of question answering by providing citations.
We will use llama.cpp using the llama-cpp-python library. Outlines supports llama-cpp-python, but we need to install it ourselves:
pip install llama-cpp-python\n
We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
import llama_cpp\nfrom outlines import generate, models\n\nmodel = models.llamacpp(\"NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF\",\n \"Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False)\n
(Optional) Store the model weights in a custom folder By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model Hermes-2-Pro-Llama-3-8B by NousResearch from HuggingFace:
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\n
We initialize the model:
import llama_cpp\nfrom llama_cpp import Llama\nfrom outlines import generate, models\n\nllm = Llama(\n \"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False\n)\n
"},{"location":"cookbook/qa-with-citations/#generate-synthetic-data","title":"Generate Synthetic Data","text":"We first need to define our Pydantic class for a user:
from pydantic import BaseModel, Field\n\nclass UserDetail(BaseModel):\n id: int = Field(..., description=\"Unique identifier\") # so the model keeps track of the number of users\n first_name: str\n last_name: str\n age: int\n
We then define a Pydantic class for a list of users:
from typing import List\n\nclass Users(BaseModel):\n users: List[UserDetail]\n
We can use a generate.json
by passing this Pydantic class we just defined, and call the generator:
model = models.LlamaCpp(llm)\ngenerator = generate.json(model, Users)\nresponse = generator(\"Create 5 fake users\", max_tokens=1024, temperature=0, seed=42)\nprint(response.users)\n# [UserDetail(id=1, first_name='John', last_name='Doe', age=25),\n# UserDetail(id=2, first_name='Jane', last_name='Doe', age=30),\n# UserDetail(id=3, first_name='Bob', last_name='Smith', age=40),\n# UserDetail(id=4, first_name='Alice', last_name='Smith', age=35),\n# UserDetail(id=5, first_name='John', last_name='Smith', age=20)]\n
for user in response.users:\n print(user.first_name)\n print(user.last_name)\n print(user.age)\n print(#####)\n# John\n# Doe\n# 25\n# #####\n# Jane\n# Doe\n# 30\n# #####\n# Bob\n# Smith\n# 40\n# #####\n# Alice\n# Smith\n# 35\n# #####\n# John\n# Smith\n# 20\n# #####\n
"},{"location":"cookbook/qa-with-citations/#qa-with-citations","title":"QA with Citations","text":"We first need to define our Pydantic class for QA with citations:
from typing import List\nfrom pydantic import BaseModel\n\nclass QuestionAnswer(BaseModel):\n question: str\n answer: str\n citations: List[str]\n\nschema = QuestionAnswer.model_json_schema()\n
We then need to adapt our prompt to the Hermes prompt format for JSON schema:
def generate_hermes_prompt(question, context, schema=schema):\n return (\n \"<|im_start|>system\\n\"\n \"You are a world class AI model who answers questions in JSON with correct and exact citations \"\n \"extracted from the `Context`. \"\n f\"Here's the json schema you must adhere to:\\n<schema>\\n{schema}\\n</schema><|im_end|>\\n\"\n \"<|im_start|>user\\n\"\n + \"`Context`: \"\n + context\n + \"\\n`Question`: \"\n + question + \"<|im_end|>\"\n + \"\\n<|im_start|>assistant\\n\"\n \"<schema>\"\n )\n
We can use generate.json
by passing the Pydantic class we previously defined, and call the generator with Hermes prompt:
question = \"What did the author do during college?\"\ncontext = \"\"\"\nMy name is Jason Liu, and I grew up in Toronto Canada but I was born in China.\nI went to an arts high school but in university I studied Computational Mathematics and physics.\nAs part of coop I worked at many companies including Stitchfix, Facebook.\nI also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.\n\"\"\"\ngenerator = generate.json(model, QuestionAnswer)\nprompt = generate_hermes_prompt(question, context)\nresponse = generator(prompt, max_tokens=1024, temperature=0, seed=42)\nprint(response)\n# QuestionAnswer(question='What did the author do during college?', answer='The author studied Computational Mathematics and physics in university and was also involved in starting the Data Science club, serving as its president for 2 years.', citations=['I went to an arts high school but in university I studied Computational Mathematics and physics.', 'I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.'])\n
We can do the same for a list of question-context pairs:
question1 = \"Where was John born?\"\ncontext1 = \"\"\"\nJohn Doe is a software engineer who was born in New York, USA.\nHe studied Computer Science at the Massachusetts Institute of Technology.\nDuring his studies, he interned at Google and Microsoft.\nHe also founded the Artificial Intelligence club at his university and served as its president for three years.\n\"\"\"\n\nquestion2 = \"What did Emily study in university?\"\ncontext2 = \"\"\"\nEmily Smith is a data scientist from London, England.\nShe attended the University of Cambridge where she studied Statistics and Machine Learning.\nShe interned at IBM and Amazon during her summer breaks.\nEmily was also the head of the Women in Tech society at her university.\n\"\"\"\n\nquestion3 = \"Which companies did Robert intern at?\"\ncontext3 = \"\"\"\nRobert Johnson, originally from Sydney, Australia, is a renowned cybersecurity expert.\nHe studied Information Systems at the University of Melbourne.\nRobert interned at several cybersecurity firms including NortonLifeLock and McAfee.\nHe was also the leader of the Cybersecurity club at his university.\n\"\"\"\n\nquestion4 = \"What club did Alice start at her university?\"\ncontext4 = \"\"\"\nAlice Williams, a native of Dublin, Ireland, is a successful web developer.\nShe studied Software Engineering at Trinity College Dublin.\nAlice interned at several tech companies including Shopify and Squarespace.\nShe started the Web Development club at her university and was its president for two years.\n\"\"\"\n\nquestion5 = \"What did Michael study in high school?\"\ncontext5 = \"\"\"\nMichael Brown is a game developer from Tokyo, Japan.\nHe attended a specialized high school where he studied Game Design.\nHe later attended the University of Tokyo where he studied Computer Science.\nMichael interned at Sony and Nintendo during his university years.\nHe also started the Game Developers club at his university.\n\"\"\"\n\nfor question, context in [\n (question1, context1),\n (question2, context2),\n (question3, context3),\n (question4, context4),\n (question5, context5),\n]:\n final_prompt = my_final_prompt(question, context)\n generator = generate.json(model, QuestionAnswer)\n response = generator(final_prompt, max_tokens=1024, temperature=0, seed=42)\n display(question)\n display(response.answer)\n display(response.citations)\n print(\"\\n\\n\")\n\n# 'Where was John born?'\n# 'John Doe was born in New York, USA.'\n# ['John Doe is a software engineer who was born in New York, USA.']\n#\n#\n# 'What did Emily study in university?'\n# 'Emily studied Statistics and Machine Learning in university.'\n# ['She attended the University of Cambridge where she studied Statistics and Machine Learning.']\n#\n#\n# 'Which companies did Robert intern at?'\n# 'Robert interned at NortonLifeLock and McAfee.'\n# ['Robert Johnson, originally from Sydney, Australia, is a renowned cybersecurity expert. He interned at several cybersecurity firms including NortonLifeLock and McAfee.']\n#\n#\n# 'What club did Alice start at her university?'\n# 'Alice started the Web Development club at her university.'\n# ['Alice Williams, a native of Dublin, Ireland, is a successful web developer. She started the Web Development club at her university and was its president for two years.']\n#\n#\n# 'What did Michael study in high school?'\n# 'Michael studied Game Design in high school.'\n# ['Michael Brown is a game developer from Tokyo, Japan. He attended a specialized high school where he studied Game Design.']\n
This example was originally contributed by Alonso Silva.
"},{"location":"cookbook/react_agent/","title":"ReAct Agent","text":"This example shows how to use outlines to build your own agent with open weights local models and structured outputs. It is inspired by the blog post A simple Python implementation of the ReAct pattern for LLMs by Simon Willison.
The ReAct pattern (for Reason+Act) is described in the paper ReAct: Synergizing Reasoning and Acting in Language Models. It's a pattern where you implement additional actions that an LLM can take - searching Wikipedia or running calculations for example - and then teach it how to request the execution of those actions, and then feed their results back into the LLM.
Additionally, we give the LLM the possibility of using a scratchpad described in the paper Show Your Work: Scratchpads for Intermediate Computation with Language Models which improves the ability of LLMs to perform multi-step computations.
We use llama.cpp using the llama-cpp-python library. Outlines supports llama-cpp-python, but we need to install it ourselves:
pip install llama-cpp-python\n
We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
import llama_cpp\nfrom outlines import generate, models\n\nmodel = models.llamacpp(\"NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF\",\n \"Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False)\n
(Optional) Store the model weights in a custom folder By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model Hermes-2-Pro-Llama-3-8B by NousResearch from HuggingFace:
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\n
We initialize the model:
import llama_cpp\nfrom llama_cpp import Llama\nfrom outlines import generate, models\n\nllm = Llama(\n \"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf\",\n tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(\n \"NousResearch/Hermes-2-Pro-Llama-3-8B\"\n ),\n n_gpu_layers=-1,\n flash_attn=True,\n n_ctx=8192,\n verbose=False\n)\n
"},{"location":"cookbook/react_agent/#build-a-react-agent","title":"Build a ReAct agent","text":"In this example, we use two tools:
- wikipedia: \\<search term> - search Wikipedia and returns the snippet of the first result
- calculate: \\<expression> - evaluate an expression using Python's eval() function
import httpx\n\ndef wikipedia(q):\n return httpx.get(\"https://en.wikipedia.org/w/api.php\", params={\n \"action\": \"query\",\n \"list\": \"search\",\n \"srsearch\": q,\n \"format\": \"json\"\n }).json()[\"query\"][\"search\"][0][\"snippet\"]\n\n\ndef calculate(numexp):\n return eval(numexp)\n
We define the logic of the agent through a Pydantic class. First, we want the LLM to decide only between the two previously defined tools:
from enum import Enum\n\nclass Action(str, Enum):\n wikipedia = \"wikipedia\"\n calculate = \"calculate\"\n
Our agent will loop through Thought and Action. We explicitly give the Action Input field so it doesn't forget to add the arguments of the Action. We also add a scratchpad (optional).
from pydantic import BaseModel, Field\n\nclass Reason_and_Act(BaseModel):\n Scratchpad: str = Field(..., description=\"Information from the Observation useful to answer the question\")\n Thought: str = Field(..., description=\"It describes your thoughts about the question you have been asked\")\n Action: Action\n Action_Input: str = Field(..., description=\"The arguments of the Action.\")\n
Our agent will reach a Final Answer. We also add a scratchpad (optional).
class Final_Answer(BaseModel):\n Scratchpad: str = Field(..., description=\"Information from the Observation useful to answer the question\")\n Final_Answer: str = Field(..., description=\"Answer to the question grounded on the Observation\")\n
Our agent will decide when it has reached a Final Answer and therefore to stop the loop of Thought and Action.
from typing import Union\n\nclass Decision(BaseModel):\n Decision: Union[Reason_and_Act, Final_Answer]\n
We could generate a response using the json schema but we will use the regex and check that everything is working as expected:
from outlines.integrations.utils import convert_json_schema_to_str\nfrom outlines.fsm.json_schema import build_regex_from_schema\n\njson_schema = Decision.model_json_schema()\nschema_str = convert_json_schema_to_str(json_schema=json_schema)\nregex_str = build_regex_from_schema(schema_str)\nprint(regex_str)\n# '\\\\{[ ]?\"Decision\"[ ]?:[ ]?(\\\\{[ ]?\"Scratchpad\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?,[ ]?\"Thought\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?,[ ]?\"Action\"[ ]?:[ ]?(\"wikipedia\"|\"calculate\")[ ]?,[ ]?\"Action_Input\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?\\\\}|\\\\{[ ]?\"Scratchpad\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?,[ ]?\"Final_Answer\"[ ]?:[ ]?\"([^\"\\\\\\\\\\\\x00-\\\\x1F\\\\x7F-\\\\x9F]|\\\\\\\\[\"\\\\\\\\])*\"[ ]?\\\\})[ ]?\\\\}'\n
We then need to adapt our prompt to the Hermes prompt format for JSON schema and explain the agent logic:
import datetime\n\ndef generate_hermes_prompt(question, schema=\"\"):\n return (\n \"<|im_start|>system\\n\"\n \"You are a world class AI model who answers questions in JSON with correct Pydantic schema. \"\n f\"Here's the json schema you must adhere to:\\n<schema>\\n{schema}\\n</schema>\\n\"\n \"Today is \" + datetime.datetime.today().strftime('%Y-%m-%d') + \".\\n\" +\n \"You run in a loop of Scratchpad, Thought, Action, Action Input, PAUSE, Observation. \"\n \"At the end of the loop you output a Final Answer. \"\n \"Use Scratchpad to store the information from the Observation useful to answer the question \"\n \"Use Thought to describe your thoughts about the question you have been asked \"\n \"and reflect carefully about the Observation if it exists. \"\n \"Use Action to run one of the actions available to you. \"\n \"Use Action Input to input the arguments of the selected action - then return PAUSE. \"\n \"Observation will be the result of running those actions. \"\n \"Your available actions are:\\n\"\n \"calculate:\\n\"\n \"e.g. calulate: 4**2 / 3\\n\"\n \"Runs a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary\\n\"\n \"wikipedia:\\n\"\n \"e.g. wikipedia: Django\\n\"\n \"Returns a summary from searching Wikipedia\\n\"\n \"DO NOT TRY TO GUESS THE ANSWER. Begin! <|im_end|>\"\n \"\\n<|im_start|>user\\n\" + question + \"<|im_end|>\"\n \"\\n<|im_start|>assistant\\n\"\n )\n
We define a ChatBot class
class ChatBot:\n def __init__(self, prompt=\"\"):\n self.prompt = prompt\n\n def __call__(self, user_prompt):\n self.prompt += user_prompt\n result = self.execute()\n return result\n\n def execute(self):\n generator = generate.regex(model, regex_str)\n result = generator(self.prompt, max_tokens=1024, temperature=0, seed=42)\n return result\n
We define a query function:
import json\n\ndef query(question, max_turns=5):\n i = 0\n next_prompt = (\n \"\\n<|im_start|>user\\n\" + question + \"<|im_end|>\"\n \"\\n<|im_start|>assistant\\n\"\n )\n previous_actions = []\n while i < max_turns:\n i += 1\n prompt = generate_hermes_prompt(question=question, schema=Decision.model_json_schema())\n bot = ChatBot(prompt=prompt)\n result = bot(next_prompt)\n json_result = json.loads(result)['Decision']\n if \"Final_Answer\" not in list(json_result.keys()):\n scratchpad = json_result['Scratchpad'] if i == 0 else \"\"\n thought = json_result['Thought']\n action = json_result['Action']\n action_input = json_result['Action_Input']\n print(f\"\\x1b[34m Scratchpad: {scratchpad} \\x1b[0m\")\n print(f\"\\x1b[34m Thought: {thought} \\x1b[0m\")\n print(f\"\\x1b[36m -- running {action}: {str(action_input)}\\x1b[0m\")\n if action + \": \" + str(action_input) in previous_actions:\n observation = \"You already run that action. **TRY A DIFFERENT ACTION INPUT.**\"\n else:\n if action==\"calculate\":\n try:\n observation = eval(str(action_input))\n except Exception as e:\n observation = f\"{e}\"\n elif action==\"wikipedia\":\n try:\n observation = wikipedia(str(action_input))\n except Exception as e:\n observation = f\"{e}\"\n print()\n print(f\"\\x1b[33m Observation: {observation} \\x1b[0m\")\n print()\n previous_actions.append(action + \": \" + str(action_input))\n next_prompt += (\n \"\\nScratchpad: \" + scratchpad +\n \"\\nThought: \" + thought +\n \"\\nAction: \" + action +\n \"\\nAction Input: \" + action_input +\n \"\\nObservation: \" + str(observation)\n )\n else:\n scratchpad = json_result[\"Scratchpad\"]\n final_answer = json_result[\"Final_Answer\"]\n print(f\"\\x1b[34m Scratchpad: {scratchpad} \\x1b[0m\")\n print(f\"\\x1b[34m Final Answer: {final_answer} \\x1b[0m\")\n return final_answer\n print(f\"\\nFinal Answer: I am sorry, but I am unable to answer your question. Please provide more information or a different question.\")\n return \"No answer found\"\n
We can now test our ReAct agent:
print(query(\"What's 2 to the power of 10?\"))\n# Scratchpad:\n# Thought: I need to perform a mathematical calculation to find the result of 2 to the power of 10.\n# -- running calculate: 2**10\n#\n# Observation: 1024\n#\n# Scratchpad: 2 to the power of 10 is 1024.\n# Final Answer: 2 to the power of 10 is 1024.\n# 2 to the power of 10 is 1024.\n
print(query(\"What does England share borders with?\"))\n# Scratchpad:\n# Thought: To answer this question, I will use the 'wikipedia' action to gather information about England's geographical location and its borders.\n# -- running wikipedia: England borders\n#\n# Observation: Anglo-Scottish <span class=\"searchmatch\">border</span> (Scottish Gaelic: Cr\u00ecochan Anglo-Albannach) is an internal <span class=\"searchmatch\">border</span> of the United Kingdom separating Scotland and <span class=\"searchmatch\">England</span> which runs for\n#\n# Scratchpad: Anglo-Scottish border (Scottish Gaelic: Cr\u00ecochan Anglo-Albannach) is an internal border of the United Kingdom separating Scotland and England which runs for\n# Final Answer: England shares a border with Scotland.\n# England shares a border with Scotland.\n
As mentioned in Simon's blog post, this is not a very robust implementation at all and there's a ton of room for improvement. But it is lovely how simple it is with a few lines of Python to make these extra capabilities available to the LLM. And now you can run it locally with an open weights LLM.
This example was originally contributed by Alonso Silva.
"},{"location":"cookbook/read-pdfs/","title":"PDF to structured output with vision language models","text":"A common task with language models is to ask language models questions about a PDF file.
Typically, the output is unstructured text, i.e. \"talking\" to your PDF.
In some cases, you may wish to extract structured information from the PDF, like tables, lists, citations, etc.
PDFs are difficult to machine read. However, you can simply convert the PDF to images, and then use a vision language model to extract structured information from the images.
This cookbook demonstrates how to
- Convert a PDF to a list of images
- Use a vision language model to extract structured information from the images
"},{"location":"cookbook/read-pdfs/#dependencies","title":"Dependencies","text":"You'll need to install these dependencies:
pip install outlines pillow transformers torch==2.4.0 pdf2image\n\n# Optional, but makes the output look nicer\npip install rich\n
"},{"location":"cookbook/read-pdfs/#import-the-necessary-libraries","title":"Import the necessary libraries","text":"from PIL import Image\nimport outlines\nimport torch\nfrom transformers import AutoProcessor\nfrom pydantic import BaseModel\nfrom typing import List, Optional\nfrom pdf2image import convert_from_path\nimport os\nfrom rich import print\nimport requests\n
"},{"location":"cookbook/read-pdfs/#choose-a-model","title":"Choose a model","text":"We've tested this example with Pixtral 12b and Qwen2-VL-7B-Instruct.
To use Pixtral:
from transformers import LlavaForConditionalGeneration\nmodel_name=\"mistral-community/pixtral-12b\"\nmodel_class=LlavaForConditionalGeneration\n
To use Qwen-2-VL:
from transformers import Qwen2VLForConditionalGeneration\nmodel_name = \"Qwen/Qwen2-VL-7B-Instruct\"\nmodel_class = Qwen2VLForConditionalGeneration\n
You can load your model into memory with:
# This loads the model into memory. On your first run,\n# it will have to download the model, which might take a while.\nmodel = outlines.models.transformers_vision(\n model_name,\n model_class=model_class,\n model_kwargs={\n \"device_map\": \"auto\",\n \"torch_dtype\": torch.bfloat16,\n },\n processor_kwargs={\n \"device\": \"auto\",\n },\n)\n
"},{"location":"cookbook/read-pdfs/#convert-the-pdf-to-images","title":"Convert the PDF to images","text":"We'll use the pdf2image
library to convert each page of the PDF to an image.
convert_pdf_to_images
is a convenience function that converts each page of the PDF to an image, and optionally saves the images to disk when output_dir
is provided.
Note: the dpi
argument is important. It controls the resolution of the images. High DPI images are higher quality and may yield better results, but they are also larger, slower to process, and require more memory.
from pdf2image import convert_from_path\nfrom PIL import Image\nimport os\nfrom typing import List, Optional\n\ndef convert_pdf_to_images(\n pdf_path: str,\n output_dir: Optional[str] = None,\n dpi: int = 120,\n fmt: str = 'PNG'\n) -> List[Image.Image]:\n \"\"\"\n Convert a PDF file to a list of PIL Image objects.\n\n Args:\n pdf_path: Path to the PDF file\n output_dir: Optional directory to save the images\n dpi: Resolution for the conversion. High DPI is high quality, but also slow and memory intensive.\n fmt: Output format (PNG recommended for quality)\n\n Returns:\n List of PIL Image objects\n \"\"\"\n # Convert PDF to list of images\n images = convert_from_path(\n pdf_path,\n dpi=dpi,\n fmt=fmt\n )\n\n # Optionally save images\n if output_dir:\n os.makedirs(output_dir, exist_ok=True)\n for i, image in enumerate(images):\n image.save(os.path.join(output_dir, f'page_{i+1}.{fmt.lower()}'))\n\n return images\n
We're going to use the Louf & Willard paper that described the method that Outlines uses for structured generation.
To download the PDF, run:
# Download the PDF file\npdf_url = \"https://arxiv.org/pdf/2307.09702\"\nresponse = requests.get(pdf_url)\n\n# Save the PDF locally\nwith open(\"louf-willard.pdf\", \"wb\") as f:\n f.write(response.content)\n
Now, we can convert the PDF to a list of images:
# Load the pdf\nimages = convert_pdf_to_images(\n \"louf-willard.pdf\",\n dpi=120,\n output_dir=\"output_images\"\n)\n
"},{"location":"cookbook/read-pdfs/#extract-structured-information-from-the-images","title":"Extract structured information from the images","text":"The structured output you can extract is exactly the same as everywhere else in Outlines -- you can use regular expressions, JSON schemas, selecting from a list of options, etc.
"},{"location":"cookbook/read-pdfs/#extracting-data-into-json","title":"Extracting data into JSON","text":"Suppose you wished to go through each page of the PDF, and extract the page description, key takeaways, and page number.
You can do this by defining a JSON schema, and then using outlines.generate.json
to extract the data.
First, define the structure you want to extract:
class PageSummary(BaseModel):\n description: str\n key_takeaways: List[str]\n page_number: int\n
Second, we need to set up the prompt. Adding special tokens can be tricky, so we use the transformers AutoProcessor
to apply the special tokens for us. To do so, we specify a list of messages, where each message is a dictionary with a role
and content
key.
Images are denoted with type: \"image\"
, and text is denoted with type: \"text\"
.
messages = [\n {\n \"role\": \"user\",\n \"content\": [\n # The text you're passing to the model --\n # this is where you do your standard prompting.\n {\"type\": \"text\", \"text\": f\"\"\"\n Describe the page in a way that is easy for a PhD student to understand.\n\n Return the information in the following JSON schema:\n {PageSummary.model_json_schema()}\n\n Here is the page:\n \"\"\"\n },\n\n # Don't need to pass in an image, since we do this\n # when we call the generator function down below.\n {\"type\": \"image\", \"image\": \"\"},\n ],\n }\n]\n\n# Convert the messages to the final prompt\nprocessor = AutoProcessor.from_pretrained(model_name)\ninstruction = processor.apply_chat_template(\n messages, tokenize=False, add_generation_prompt=True\n)\n
Now we iterate through each image, and extract the structured information:
# Page summarizer function\npage_summary_generator = outlines.generate.json(model, PageSummary)\n\nfor image in images:\n result = page_summary_generator(instruction, [image])\n print(result)\n
"},{"location":"cookbook/read-pdfs/#regular-expressions-to-extract-the-arxiv-paper-identifier","title":"Regular expressions to extract the arxiv paper identifier","text":"The arXiv paper identifier is a unique identifier for each paper. These identifiers have the format arXiv:YYMM.NNNNN
(five end digits) or arXiv:YYMM.NNNN
(four end digits). arXiv identifiers are typically watermarked on papers uploaded to arXiv.
arXiv identifiers are optionally followed by a version number, i.e. arXiv:YYMM.NNNNNvX
.
We can use a regular expression to define this patter:
paper_regex = r'arXiv:\\d{2}[01]\\d\\.\\d{4,5}(v\\d)?'\n
We can build an extractor function from the regex:
id_extractor = outlines.generate.regex(model, paper_regex)\n
Now, we can extract the arxiv paper identifier from the first image:
arxiv_instruction = processor.apply_chat_template(\n [\n {\n \"role\": \"user\",\n \"content\": [\n {\"type\": \"text\", \"text\": f\"\"\"\n Extract the arxiv paper identifier from the page.\n\n Here is the page:\n \"\"\"},\n {\"type\": \"image\", \"image\": \"\"},\n ],\n }\n ],\n tokenize=False,\n add_generation_prompt=True\n)\n\n# Extract the arxiv paper identifier\npaper_id = id_extractor(arxiv_instruction, [images[0]])\n
As of the time of this writing, the arxiv paper identifier is
arXiv:2307.09702v4\n
Your version number may be different, but the part before vX
should match.
"},{"location":"cookbook/read-pdfs/#categorize-the-paper-into-one-of-several-categories","title":"Categorize the paper into one of several categories","text":"outlines.generate.choice
allows the model to select one of several options. Suppose we wanted to categorize the paper into being about \"language models\", \"economics\", \"structured generation\", or \"other\".
Let's define a few categories we might be interested in:
categories = [\n \"llms\",\n \"cell biology\",\n \"other\"\n]\n
Now we can construct the prompt:
categorization_instruction = processor.apply_chat_template(\n [\n {\n \"role\": \"user\",\n \"content\": [\n {\"type\": \"text\", \"text\": f\"\"\"\n Please choose one of the following categories\n that best describes the paper.\n\n {categories}\n\n Here is the paper:\n \"\"\"},\n\n {\"type\": \"image\", \"image\": \"\"},\n ],\n }\n ],\n tokenize=False,\n add_generation_prompt=True\n)\n
Now we can show the model the first page and extract the category:
# Build the choice extractor\ncategorizer = outlines.generate.choice(\n model,\n categories\n)\n\n# Categorize the paper\ncategory = categorizer(categorization_instruction, [images[0]])\nprint(category)\n
Which should return:
llms\n
"},{"location":"cookbook/read-pdfs/#additional-notes","title":"Additional notes","text":"You can provide multiple images to the model by
- Adding additional image messages
- Providing a list of images to the
generate
function
For example, to have two images, you can do:
two_image_prompt = processor.apply_chat_template(\n [\n {\n \"role\": \"user\",\n \"content\": [\n {\"type\": \"text\", \"text\": \"are both of these images of hot dogs?\"},\n\n # Tell the model there are two images\n {\"type\": \"image\", \"image\": \"\"},\n {\"type\": \"image\", \"image\": \"\"},\n ],\n }\n ],\n tokenize=False,\n add_generation_prompt=True\n)\n\n# Pass two images to the model\ngenerator = outlines.generate.choice(\n model,\n [\"hot dog\", \"not hot dog\"]\n)\n\nresult = generator(\n two_image_prompt,\n\n # Pass two images to the model\n [images[0], images[1]]\n)\nprint(result)\n
Using the first to pages of the paper (they are not images of hot dogs), we should get
not hot dog\n
"},{"location":"cookbook/receipt-digitization/","title":"Receipt Data Extraction with VLMs","text":""},{"location":"cookbook/receipt-digitization/#setup","title":"Setup","text":"You'll need to install the dependencies:
pip install outlines torch==2.4.0 transformers accelerate pillow rich\n
"},{"location":"cookbook/receipt-digitization/#import-libraries","title":"Import libraries","text":"Load all the necessary libraries:
# LLM stuff\nimport outlines\nimport torch\nfrom transformers import AutoProcessor\nfrom pydantic import BaseModel, Field\nfrom typing import Literal, Optional, List\n\n# Image stuff\nfrom PIL import Image\nimport requests\n\n# Rich for pretty printing\nfrom rich import print\n
"},{"location":"cookbook/receipt-digitization/#choose-a-model","title":"Choose a model","text":"This example has been tested with mistral-community/pixtral-12b
(HF link) and Qwen/Qwen2-VL-7B-Instruct
(HF link).
We recommend Qwen-2-VL as we have found it to be more accurate than Pixtral.
If you want to use Qwen-2-VL, you can do the following:
# To use Qwen-2-VL:\nfrom transformers import Qwen2VLForConditionalGeneration\nmodel_name = \"Qwen/Qwen2-VL-7B-Instruct\"\nmodel_class = Qwen2VLForConditionalGeneration\n
If you want to use Pixtral, you can do the following:
# To use Pixtral:\nfrom transformers import LlavaForConditionalGeneration\nmodel_name=\"mistral-community/pixtral-12b\"\nmodel_class=LlavaForConditionalGeneration\n
"},{"location":"cookbook/receipt-digitization/#load-the-model","title":"Load the model","text":"Load the model into memory:
model = outlines.models.transformers_vision(\n model_name,\n model_class=model_class,\n model_kwargs={\n \"device_map\": \"auto\",\n \"torch_dtype\": torch.bfloat16,\n },\n processor_kwargs={\n \"device\": \"cuda\", # set to \"cpu\" if you don't have a GPU\n },\n)\n
"},{"location":"cookbook/receipt-digitization/#image-processing","title":"Image processing","text":"Images can be quite large. In GPU-poor environments, you may need to resize the image to a smaller size.
Here's a helper function to do that:
def load_and_resize_image(image_path, max_size=1024):\n \"\"\"\n Load and resize an image while maintaining aspect ratio\n\n Args:\n image_path: Path to the image file\n max_size: Maximum dimension (width or height) of the output image\n\n Returns:\n PIL Image: Resized image\n \"\"\"\n image = Image.open(image_path)\n\n # Get current dimensions\n width, height = image.size\n\n # Calculate scaling factor\n scale = min(max_size / width, max_size / height)\n\n # Only resize if image is larger than max_size\n if scale < 1:\n new_width = int(width * scale)\n new_height = int(height * scale)\n image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)\n\n return image\n
You can change the resolution of the image by changing the max_size
argument. Small max sizes will make the image more blurry, but processing will be faster and require less memory.
"},{"location":"cookbook/receipt-digitization/#load-an-image","title":"Load an image","text":"Load an image and resize it. We've provided a sample image of a Trader Joe's receipt, but you can use any image you'd like.
Here's what the image looks like:
# Path to the image\nimage_path = \"https://raw.githubusercontent.com/dottxt-ai/outlines/refs/heads/main/docs/cookbook/images/trader-joes-receipt.jpg\"\n\n# Download the image\nresponse = requests.get(image_path)\nwith open(\"receipt.png\", \"wb\") as f:\n f.write(response.content)\n\n# Load + resize the image\nimage = load_and_resize_image(\"receipt.png\")\n
"},{"location":"cookbook/receipt-digitization/#define-the-output-structure","title":"Define the output structure","text":"We'll define a Pydantic model to describe the data we want to extract from the image.
In our case, we want to extract the following information:
- The store name
- The store address
- The store number
- A list of items, including the name, quantity, price per unit, and total price
- The tax
- The total
- The date
- The payment method
Most fields are optional, as not all receipts contain all information.
class Item(BaseModel):\n name: str\n quantity: Optional[int]\n price_per_unit: Optional[float]\n total_price: Optional[float]\n\nclass ReceiptSummary(BaseModel):\n store_name: str\n store_address: str\n store_number: Optional[int]\n items: List[Item]\n tax: Optional[float]\n total: Optional[float]\n # Date is in the format YYYY-MM-DD. We can apply a regex pattern to ensure it's formatted correctly.\n date: Optional[str] = Field(pattern=r'\\d{4}-\\d{2}-\\d{2}', description=\"Date in the format YYYY-MM-DD\")\n payment_method: Literal[\"cash\", \"credit\", \"debit\", \"check\", \"other\"]\n
"},{"location":"cookbook/receipt-digitization/#prepare-the-prompt","title":"Prepare the prompt","text":"We'll use the AutoProcessor
to convert the image and the text prompt into a format that the model can understand. Practically, this is the code that adds user, system, assistant, and image tokens to the prompt.
# Set up the content you want to send to the model\nmessages = [\n {\n \"role\": \"user\",\n \"content\": [\n {\n # The image is provided as a PIL Image object\n \"type\": \"image\",\n \"image\": image,\n },\n {\n \"type\": \"text\",\n \"text\": f\"\"\"You are an expert at extracting information from receipts.\n Please extract the information from the receipt. Be as detailed as possible --\n missing or misreporting information is a crime.\n\n Return the information in the following JSON schema:\n {ReceiptSummary.model_json_schema()}\n \"\"\"},\n ],\n }\n]\n\n# Convert the messages to the final prompt\nprocessor = AutoProcessor.from_pretrained(model_name)\nprompt = processor.apply_chat_template(\n messages, tokenize=False, add_generation_prompt=True\n)\n
If you are curious, the final prompt that is sent to the model looks (roughly) like this:
<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>\nYou are an expert at extracting information from receipts.\nPlease extract the information from the receipt. Be as detailed as\npossible -- missing or misreporting information is a crime.\n\nReturn the information in the following JSON schema:\n\n<JSON SCHEMA OMITTED>\n<|im_end|>\n<|im_start|>assistant\n
"},{"location":"cookbook/receipt-digitization/#run-the-model","title":"Run the model","text":"# Prepare a function to process receipts\nreceipt_summary_generator = outlines.generate.json(\n model,\n ReceiptSummary,\n\n # Greedy sampling is a good idea for numeric\n # data extraction -- no randomness.\n sampler=outlines.samplers.greedy()\n)\n\n# Generate the receipt summary\nresult = receipt_summary_generator(prompt, [image])\nprint(result)\n
"},{"location":"cookbook/receipt-digitization/#output","title":"Output","text":"The output should look like this:
ReceiptSummary(\n store_name=\"Trader Joe's\",\n store_address='401 Bay Street, San Francisco, CA 94133',\n store_number=0,\n items=[\n Item(name='BANANA EACH', quantity=7, price_per_unit=0.23, total_price=1.61),\n Item(name='BAREBELLS CHOCOLATE DOUG', quantity=1, price_per_unit=2.29, total_price=2.29),\n Item(name='BAREBELLS CREAMY CRISP', quantity=1, price_per_unit=2.29, total_price=2.29),\n Item(name='BAREBELLS CHOCOLATE DOUG', quantity=1, price_per_unit=2.29, total_price=2.29),\n Item(name='BAREBELLS CARAMEL CASHEW', quantity=2, price_per_unit=2.29, total_price=4.58),\n Item(name='BAREBELLS CREAMY CRISP', quantity=1, price_per_unit=2.29, total_price=2.29),\n Item(name='SPINDRIFT ORANGE MANGO 8', quantity=1, price_per_unit=7.49, total_price=7.49),\n Item(name='Bottle Deposit', quantity=8, price_per_unit=0.05, total_price=0.4),\n Item(name='MILK ORGANIC GALLON WHOL', quantity=1, price_per_unit=6.79, total_price=6.79),\n Item(name='CLASSIC GREEK SALAD', quantity=1, price_per_unit=3.49, total_price=3.49),\n Item(name='COBB SALAD', quantity=1, price_per_unit=5.99, total_price=5.99),\n Item(name='PEPPER BELL RED XL EACH', quantity=1, price_per_unit=1.29, total_price=1.29),\n Item(name='BAG FEE.', quantity=1, price_per_unit=0.25, total_price=0.25),\n Item(name='BAG FEE.', quantity=1, price_per_unit=0.25, total_price=0.25)\n ],\n tax=0.68,\n total=41.98,\n date='2023-11-04',\n payment_method='debit',\n\n)\n
Voila! You've successfully extracted information from a receipt using an LLM.
"},{"location":"cookbook/receipt-digitization/#bonus-roasting-the-user-for-their-receipt","title":"Bonus: roasting the user for their receipt","text":"You can roast the user for their receipt by adding a roast
field to the end of the ReceiptSummary
model.
class ReceiptSummary(BaseModel):\n ...\n roast: str\n
which gives you a result like
ReceiptSummary(\n ...\n roast=\"You must be a fan of Trader Joe's because you bought enough\n items to fill a small grocery bag and still had to pay for a bag fee.\n Maybe you should start using reusable bags to save some money and the\n environment.\"\n)\n
Qwen is not particularly funny, but worth a shot.
"},{"location":"cookbook/simtom/","title":"Build perspective-taking agents with SimToM","text":"Prompting strategies like Chain-of-Thought (CoT) can improve LLMs' reasoning capabilities. However, they underwhelm in tasks that require keeping track of inconsistent world states. SimToM proposes a simple, two-stage prompting framework for LLMs inspired by Simulation Theory. The authors showed that this approach outperforms zero-shot prompting and CoT on ToMI and BigToM, two benchmarks with Theory of Mind questions.
In this example, we will implement SimToM with a few lines of code using Outlines' prompt templating and structured generation capabilities.
"},{"location":"cookbook/simtom/#how-simtom-works","title":"How SimToM works","text":"SimToM calls an LLM with two consecutive prompts:
- Perspective-taking: The first prompt receives a
story
and a character
. The goal is to understand the situation based on the character's point of view and filter out the rest of the story. - Question-Answering: The second prompt receives the character's point of view from the previous step and tasks the LLM to answer a question using that context.
"},{"location":"cookbook/simtom/#outlines-implementation","title":"Outlines implementation","text":"To implement SimToM with Outlines, we will need to:
- Write the prompts with prompt functions.
- Define the JSON object each prompt will return using Pydantic.
- Generate responses with a Mistral model using the transformers integration.
Let's dive into it!
"},{"location":"cookbook/simtom/#using-prompt-functions","title":"Using Prompt Functions","text":"With Outlines, you can write your prompts as Python functions by adding the @outlines.prompt
decorator. The prompt template is contained in their docstring, and their arguments correspond to variables used in the prompt.
The authors have shared their code, prompts and data in this GitHub repository. Below, we define in Outlines the prompts they used for the ToMI dataset:
import outlines\n\n\n@outlines.prompt\ndef perspective_taking(story: str, character: str) -> None:\n \"\"\"<s>[INST] The following is a sequence of events about some characters, that takes place in multiple locations.\n Your job is to output only the events that the specified character, {{character}}, knows about.\n\n Here are a few rules:\n 1. A character knows about all events that they do.\n 2. If a character is in a certain room/location, that character knows about all other events that happens in the room. This includes other characters leaving or exiting the location, the locations of objects in that location, and whether somebody moves an object to another place.\n 3. If a character leaves a location, and is NOT in that location, they no longer know about any events that happen within that location. However, they can re-enter the location.\n\n Story: {{story}}\n What events does {{character}} know about? Only output the events according to the above rules, do not provide an explanation. [/INST]\"\"\" # noqa\n\n@outlines.prompt\ndef simulation(events: list, name: str, question: str) -> None:\n \"\"\"<s>[INST] {% for event in events %}\n {{event}}\n {% endfor %}\n You are {{name}}.\n Based on the above information, answer the following question:\n {{question}}\n You must choose one of the above choices, do not say there is not enough information. Answer with a single word, do not output anything else. [/INST]\"\"\" # noqa\n
"},{"location":"cookbook/simtom/#json-structured-generation","title":"JSON Structured Generation","text":"Outlines guarantees that the LLM will return a valid JSON object, which we can specify as a Pydantic model.
We will need two Pydantic models for SimToM, one for each prompt:
from pydantic import BaseModel, Field\nfrom typing import List\n\n\nclass PerspectiveTaking(BaseModel):\n \"\"\"This is for the first prompt.\"\"\"\n character: str = Field(description=\"The character we extract the events for.\")\n events: List[str] = Field(description=\"All events that the character knows about.\")\n\n\nclass Simulation(BaseModel):\n \"\"\"This is for the second prompt.\"\"\"\n answer: str\n
"},{"location":"cookbook/simtom/#calling-an-llm","title":"Calling an LLM","text":"Let's try SimToM with an example from the ToMI dataset:
story = \"\"\"\n1 Aria entered the front_yard.\n2 Aiden entered the front_yard.\n3 The grapefruit is in the green_bucket.\n4 Aria moved the grapefruit to the blue_container.\n5 Aiden exited the front_yard.\n6 Noah entered the playroom.\n\"\"\"\nquestion = \"7 Where was the grapefruit at the beginning?\"\ncharacter = \"Aria\"\n
We load Mistral-7B-Instruct-v0.3
, create the prompt using the template we defined earlier, and generate a structured response. As a reminder, the goal of the first call is to get all the events a character, Aria
, knows about.
# Load an LLM from Hugging Face\nMODEL_NAME = \"mistral-community/Mistral-7B-Instruct-v0.3\"\nmodel = outlines.models.transformers(MODEL_NAME, device=\"cuda\")\n\nperspective_prompt = perspective_taking(story=story, character=character)\n\n# Call Mistral 7B with the first prompt\ngenerator = outlines.generate.json(model, PerspectiveTaking)\nperspective = generator(perspective_prompt)\n\nprint(perspective.model_dump())\n# {'character': 'Aria', 'events': ['1 Aria entered the front_yard.', '3 The grapefruit is in the green_bucket.', '4 Aria moved the grapefruit to the blue_container.']}\n
Not bad! We will now generate the second prompt with those events.
sim_prompt = simulation(events=perspective.events, name=character, question=question)\n\n# Call Mistral 7B with the second prompt\ngenerator = outlines.generate.json(model, Simulation)\nresult = generator(sim_prompt)\n\nprint(result.model_dump())\n# {'answer': 'green_bucket'}\n
And this is it! SimToM could be useful in agentic workflows, where agents must act based on what they know, not all available information. One caveat of SimToM is that the perspective-taking step may remove important information, leading to wrong results. As the authors note in their paper, it can feature as a simple and effective baseline for evaluating LLMs on Theory of Mind reasoning tasks.
"},{"location":"cookbook/structured_generation_workflow/","title":"Structured Generation Workflow: Generating Synthetic Phone Numbers","text":"This is a condensed version of Coding for Structured Generation with LLMs.
For this example we're going to be building an LLM program to generate synthetic data in the form of realistic looking phone numbers for Washington State. Using an LLM for this task is a bit overkill since we could just as easily accomplish this with a tool like Faker, but this example still serves as a useful way to demonstrate a workflow for using structured generation.
"},{"location":"cookbook/structured_generation_workflow/#unstructured-approach","title":"Unstructured approach","text":"Before diving into how to use structure generation for this task let's start with an unstructured example. We begin by loading our model:
import outlines\n\nmodel_name = 'microsoft/Phi-3-mini-4k-instruct'\nmodel = outlines.models.transformers(model_name)\n
Next we need a prompt for this model. Since we're focusing on structured generation, we won't be engaging in any form of \"prompt hacking\" and will be leaving this prompt untouched for the rest of this example.
tokenizer = AutoTokenizer.from_pretrained(model_name)\n\nmessages_phone = [\n {\"role\": \"user\", \"content\": \"\"\"\n Please generate a realistic phone number for Washington State in the following format\n\n (555) 555-5555\n\n \"\"\"}\n]\n\n# This allows us to properly format our prompt for\n# Phi-3 Mini's 'Instruct' interface.\nprompt_phone = tokenizer.apply_chat_template(messages_phone, tokenize=False)\n
With our prompt ready we can now generate 10 example phone numbers
phone_generator_unstruct = outlines.generate.text(model)\nfor _ in range(10):\n print(phone_generator_unstruct(prompt_phone,max_tokens=12))\n
I'd be happy to help you generate a realistic phone\\ I cannot generate a real phone number as I'm just\\ I'm an AI and don't have the ability\\ Sure! Here is a randomly generated phone number in the format\\ Here's a phone number that fits the format for a\\ In Washington State, phone numbers typically have a three-dig\\ Here are a few examples of phone numbers that could be considered\\ I'd be happy to help generate a realistic phone number\\ I'd be happy to help you generate a random phone\\ Based on the format you provided, a realistic phone number for\\
As we can see, none of these outputs are even phone numbers!
Let's see if we can improve this using structured generation.
"},{"location":"cookbook/structured_generation_workflow/#the-structured-generation-workflow","title":"The Structured Generation Workflow","text":"In order to solve this problem we're going to introduce a Structured Generation Workflow outlined in this image:
Let's step through this:
"},{"location":"cookbook/structured_generation_workflow/#real-example","title":"Real example","text":"We start with a real example phone number, in this case for the Seattle Public Library, that we can use to verify the structure we are creating.
phone_number = \"(206) 386-4636\"\n
For a simple example like this, we'll just be using a single phone number, for more complex examples it can be helpful to have more examples.
"},{"location":"cookbook/structured_generation_workflow/#draft-structure","title":"Draft Structure","text":"The next step in the process is for use to define a simple regex that we feel correctly models our real data.
phone_regex_1 = r'\\([0-9]{3}\\) [0-9]{3}-[0-9]{4}'\n
Next we need to validate this regex against our real data.
"},{"location":"cookbook/structured_generation_workflow/#validate-by-matching-examples","title":"Validate by matching examples","text":"Whenever writing non-trivial code with structured generation it is essential that you first validate the code against your real data example(s).
We'll start with a simple method of validation: just checking that our regex matches the data.
import re\nre.match(phone_regex_1, phone_number)\n\n# <re.Match object; span=(0, 14), match='(206) 386-4636'>\n
Now that we have a match, we can move on to generating structured output!
"},{"location":"cookbook/structured_generation_workflow/#generate-structure","title":"Generate Structure","text":"We're ready to see if structured generation can make an improvement over our initial unstructured approach:
phone_generator_v1 = outlines.generate.regex(model, phone_regex_1)\nfor _ in range(10):\n print(phone_generator_v1(prompt_phone))\n
(206) 555-1234\\ (206) 555-1234\\ (206) 555-1234\\ (206) 555-1234\\ (206) 555-1234\\ (206) 555-1234\\ (206) 123-4567\\ (206) 555-1234\\ (206) 555-1234\\ (206) 555-1234
At least we have phone numbers! But I think we can do better!
"},{"location":"cookbook/structured_generation_workflow/#inspect-output","title":"Inspect output","text":"In this case the model did create phone numbers and, impressively, got the area code correct. So using structured generation did improve things. However these numbers are pretty boring. Let's improve that structure!
"},{"location":"cookbook/structured_generation_workflow/#iteration","title":"Iteration","text":"We've walked through the loop once, so we can go quickly now through each iteration.
We start by improving our structure:
phone_regex_2 = r'\\([0-9]{3}\\) [2-46-9]{3}-[02-9]{4}'\n
Before rushing to another round of generation, let's validate this new regex. We'll add just a bit more sophistication over our last check:
re.match(phone_regex_2, phone_number)[0] == phone_number\n# True\n
Now that we've validated, let's generate with this new regex! phone_generator_v2 = outlines.generate.regex(model,\n phone_regex_2)\nfor _ in range(10):\n print(phone_generator_v2(prompt_phone))\n
(206) 867-5309\\ (206) 666-7777\\ (206) 444-3333\\ (206) 444-3333\\ (206) 943-2222\\ (206) 323-6789\\ (206) 444-3333\\ (206) 867-5309\\ (206) 466-2255\\ (206) 222-3333
Better, but I don't like those repeated sequences. Like good software developers, let's iterate again!
"},{"location":"cookbook/structured_generation_workflow/#reiteration-with-debugging","title":"Reiteration - with debugging","text":"Here's a fancier regex that should give us more interesting results:
phone_regex_3_error = r'\\([0-9]{3}\\) [2-4][7-9][4-6]-[3-6][2-8][1-4]'\n
This looks good to me, but there's a subtle bug, that's why we always need to validate our structure against real data. This time we'll make our validator do a bit more work to verify the correct string is matched:
if not re.match(phone_regex_3_error, phone_number):\n print(\"Regex fails match\")\nelse:\n matched_string = re.match(phone_regex_3_error, phone_number)[0]\n if matched_string == phone_number:\n print(\"Successful match\")\n else:\n print(f\"Error {matched_string} != {phone_number}\")\n
This prints out: Error (206) 386-463 != (206) 386-4636
Ah! We were missing the last digit, let's fix that and regenerate:
phone_regex_3_fixed = r'\\([0-9]{3}\\) [2-4][7-9][4-6]-[3-6][2-8][1-4][6-9]'\nphone_generator_v3 = outlines.generate.regex(model,\n phone_regex_3_fixed)\nfor _ in range(10):\n print(phone_generator_v3(prompt_phone))\n
(206) 494-3216\\ (206) 374-6218\\ (206) 494-3337\\ (206) 476-3216\\ (206) 484-3548\\ (206) 495-3218\\ (206) 494-5517\\ (206) 375-4636\\ (206) 384-6216\\ (206) 385-6218
Much better!
Now you've seen a quick example of the structured generation workflow that can be used at the basis for building and iteration on much larger structured generation tasks!
"},{"location":"reference/","title":"Reference","text":""},{"location":"reference/#structured-generation","title":"Structured generation","text":"While LLM capabilities are increasingly impressive, we can make their output more reliable by steering the generation. Outlines thus offers mechanisms to specify high level constraints on text completions by generative language models.
Stopping sequence By default, language models stop generating tokens after and token was generated, or after a set maximum number of tokens. Their output can be verbose, and for practical purposes it is often necessary to stop the generation after a given sequence has been found instead. You can use the stop_at keyword argument when calling the model with a prompt:
import outlines.models as models\n\ncomplete = models.openai(\"gpt-4o-mini\")\nexpert = complete(\"Name an expert in quantum gravity.\", stop_at=[\"\\n\", \".\"])\n
"},{"location":"reference/functions/","title":"Outlines functions","text":""},{"location":"reference/prompting/","title":"Prompt templating","text":"Outlines provides a powerful domain-specific language to write and manage prompts, via what we call prompt functions. Prompt functions are Python functions that contain a template for the prompt in their docstring, and their arguments correspond to the variables used in the prompt. When called, a prompt function returns the template rendered with the values of the arguments.
The aim of prompt functions is to solve several recurrent problems with prompting:
- Building complex prompts quickly leads to messy code. This problem has already been solved in the web development community by using templating, so why not use it here?
- Composing prompts is difficult. Why not just compose functions?
- Separating prompts from code. Encapsulation in functions allows a clean separation between prompts and code. Moreover, like any function, prompt functions can be imported from other modules.
Outlines uses the Jinja templating engine to render prompts, which allows to easily compose complex prompts.
Prompt rendering
Prompt functions are opinionated when it comes to prompt rendering. These opinions are meant to avoid common prompting errors, but can have unintended consequences if you are doing something unusual. We advise to always print the prompt before using it. You can also read the reference section if you want to know more.
"},{"location":"reference/prompting/#your-first-prompt","title":"Your first prompt","text":"The following snippet showcases a very simple prompt. The variables between curly brackets {{ }}
are placeholders for the values of the arguments you will pass to the prompt function.
CodeOutput import outlines\n\n@outlines.prompt\ndef greetings(name, question):\n \"\"\"Hello, {{ name }}!\n {{ question }}\n \"\"\"\n\nprompt = greetings(\"user\", \"How are you?\")\nprint(prompt)\n
Hello, user!\nHow are you?\n
If a variable is missing in the function's arguments, Jinja2 will throw an UndefinedError
exception:
CodeOutput import outlines\n\n@outlines.prompt\ndef greetings(name):\n \"\"\"Hello, {{ surname }}!\"\"\"\n\nprompt = greetings(\"user\")\n
Traceback (most recent call last):\n File \"<stdin>\", line 9, in <module>\n File \"/home/remi/projects/normal/outlines/outlines/prompts.py\", line 38, in __call__\n return render(self.template, **bound_arguments.arguments)\n File \"/home/remi/projects/normal/outlines/outlines/prompts.py\", line 213, in render\n return jinja_template.render(**values)\n File \"/home/remi/micromamba/envs/outlines/lib/python3.9/site-packages/jinja2/environment.py\", line 1301, in render\n self.environment.handle_exception()\n File \"/home/remi/micromamba/envs/outlines/lib/python3.9/site-packages/jinja2/environment.py\", line 936, in handle_exception\n raise rewrite_traceback_stack(source=source)\n File \"<template>\", line 1, in top-level template code\n jinja2.exceptions.UndefinedError: 'surname' is undefined\n
"},{"location":"reference/prompting/#importing-prompt-functions","title":"Importing prompt functions","text":"Prompt functions are functions, and thus can be imported from other modules:
prompts.pygenerate.pyOutput import outlines\n\n@outlines.prompt\ndef greetings(name, question):\n \"\"\"Hello, {{ name }}!\n {{ question }}\n \"\"\"\n
from .prompts import greetings\n\nprompt = greetings(\"John Doe\", \"How are you today?\")\n
Hello, John Doe!\nHow are you today?\n
"},{"location":"reference/prompting/#few-shot-prompting","title":"Few-shot prompting","text":"Few-shot prompting can lead to messy code. Prompt functions allow you to loop over lists or dictionaries from the template. In the following example we demonstrate how we can generate a prompt by passing a list of dictionaries with keys question
and answer
to the prompt function:
CodeOutput import outlines\n\n@outlines.prompt\ndef few_shots(instructions, examples, question):\n \"\"\"{{ instructions }}\n\n Examples\n --------\n\n {% for example in examples %}\n Q: {{ example.question }}\n A: {{ example.answer }}\n\n {% endfor %}\n Question\n --------\n\n Q: {{ question }}\n A:\n \"\"\"\n\ninstructions = \"Please answer the following question following the examples\"\nexamples = [\n {\"question\": \"2+2=?\", \"answer\":4},\n {\"question\": \"3+3=?\", \"answer\":6}\n]\nquestion = \"4+4 = ?\"\n\nprompt = few_shots(instructions, examples, question)\nprint(prompt)\n
Please answer the following question following the examples\n\nExamples\n--------\n\nQ: 2+2=?\nA: 4\n\nQ: 3+3=?\nA: 6\n\nQuestion\n--------\n\nQ: 4+4 = ?\nA:\n
"},{"location":"reference/prompting/#conditionals-filters-etc","title":"Conditionals, filters, etc.","text":"Jinja2 has many features beyond looping that are not described here: conditionals, filtering, formatting, etc. Please refer to the Jinja documentation for more information about the syntax of the templating language. The Jinja syntax is powerful, and we recommend you take some time to read their documentation if you are building complex prompts.
"},{"location":"reference/prompting/#tools","title":"Tools","text":"Several projects (e.g.Toolformer, ViperGPT, AutoGPT, etc.) have shown that we can \"teach\" language models to use external functions by describing what these functions do in the prompt. In these projects the same information is often repeated twice: the function implementation, name, docstring, or arguments are copy-pasted in the prompt. This is cumbersome and error prone; you can directly pull this information from within an Outlines prompt function:
CodeOutput import outlines\n\ndef my_tool(arg1: str, arg2: int):\n \"\"\"Tool description.\n\n The rest of the docstring\n \"\"\"\n pass\n\n@outlines.prompt\ndef tool_prompt(question, tool):\n \"\"\"{{ question }}\n\n COMMANDS\n 1. {{ tool | name }}: {{ tool | description }}, args: {{ tool | args }}\n\n {{ tool | source }}\n \"\"\"\n\nprompt = tool_prompt(\"Can you do something?\", my_tool)\nprint(prompt)\n
Can you do something?\n\nCOMMANDS\n1. my_tool: Tool description., args: arg1: str, arg2: int\n\ndef my_tool(arg1: str, arg2: int):\n \"\"\"Tool description.\n\n The rest of the docstring\n \"\"\"\n pass\n
"},{"location":"reference/prompting/#json-response-format","title":"JSON response format","text":"To build reliable chains with language models we often need to instruct them the format in which we would like them to return their response.
Without prompt templating, the information is repeated twice between creating the parsing function (e.g. a Pydantic model), and writing the desired schema in the prompt. This can lead to errors that are hard to debug.
Outlines allows you to directly pull the JSON schema of a pydantic model, or pretty print a dictionary from within an Outlines prompt function
CodeOutput from pydantic import BaseModel, Field\n\nimport outlines\n\nclass MyResponse(BaseModel):\n field1: int = Field(description=\"an int\")\n field2: str\n\n@outlines.prompt\ndef my_prompt(response_model):\n \"\"\"{{ response_model | schema }}\"\"\"\n\nprompt = my_prompt(MyResponse)\nprint(prompt)\n# {\n# \"field1\": \"an int\",\n# \"field2\": \"<field2>\"\n# }\n
response = {\n \"field1\": \"<field1>\",\n \"field2\": \"a string\"\n}\n\nmy_prompt(MyResponse)\n# {\n# \"field1\": \"<field1>\",\n# \"field2\": \"a string\"\n# }\n
"},{"location":"reference/prompting/#formatting-conventions","title":"Formatting conventions","text":"Prompt functions are opinionated when it comes to rendering, and these opinions are meant to avoid prompting mistakes and help with formatting.
"},{"location":"reference/prompting/#whitespaces","title":"Whitespaces","text":"If you have experience working with strings between triple quotes you know that indenting has an influence on the string's formatting. Prompt functions adopt a few conventions so you don't have to think about indents when writing prompt.
First, whether you start the prompt right after the triple quotes or on the line below does not matter for formatting:
CodeOutput import outlines\n\n@outlines.prompt\ndef prompt1():\n \"\"\"My prompt\n \"\"\"\n\n@outlines.prompt\ndef prompt2():\n \"\"\"\n My prompt\n \"\"\"\n\nprint(prompt1())\nprint(prompt2())\n
My prompt\nMy prompt\n
Indentation is relative to the second line of the docstring, and leading spaces are removed:
CodeOutput import outlines\n\n@outlines.prompt\ndef example1():\n \"\"\"First line\n Second line\n \"\"\"\n\n@outlines.prompt\ndef example2():\n \"\"\"\n Second line\n Third line\n \"\"\"\n\n@outlines.prompt\ndef example3():\n \"\"\"\n Second line\n Third line\n \"\"\"\n\nprint(example1())\nprint(example2())\nprint(example3())\n
First line\nSecond line\n\nSecond line\nThird line\n\nSecond line\n Third line\n
Trailing whitespaces are not removed, unless they follow a linebreak symbol \\
(see linebreaks).
"},{"location":"reference/prompting/#linebreaks","title":"Linebreaks","text":"You can use the backslash \\
to break a long line of text. It will render as a single line:
CodeOutput import outlines\n\n@outlines.prompt\ndef example():\n \"\"\"\n Break in \\\n several lines \\\n But respect the indentation\n on line breaks.\n And after everything \\\n Goes back to normal\n \"\"\"\n\nprint(example())\n
Break in several lines But respect the indentation\n on line breaks.\nAnd after everything Goes back to normal\n
"},{"location":"reference/samplers/","title":"Samplers","text":"Outlines offers different sequence sampling algorithms, and we will integrate more in the future. You can read this blog post for an overview of the different sampling algorithm.
Samplers provide control over the sampling process, allowing you to influence the output of the model. This can include controlling randomness (temperature), biasing towards certain tokens (top-k, top-p), or sequence generation (beam search).
"},{"location":"reference/samplers/#multinomial-sampling","title":"Multinomial sampling","text":"Multinomial sampling is the default sampling algorithm in Outlines.
As an example, suppose we have only two possible tokens: \"H\" and \"T\". For a fixed prompt such as \"Flip a coin, did you get heads or tails?\" The language model calculates probability for each token:
Token Probability \"H\" 0.5 \"T\" 0.5 You'd expect to receive \"H\" 50% of the time and \"T\" 50% of the time.
"},{"location":"reference/samplers/#parameters","title":"Parameters","text":" samples
: Number of samples to generate (default: 1) top_k
: Only consider the top k tokens (optional) top_p
: Only consider the top tokens with cumulative probability >= p (optional) temperature
: Controls randomness of sampling (optional)
"},{"location":"reference/samplers/#default-behavior","title":"Default behavior","text":"Outlines defaults to the multinomial sampler without top-p or top-k sampling, and temperature equal to 1.
Not specifying a sampler is equivalent to:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.multinomial()\n\ngenerator = generate.text(model, sampler)\nanswer = generator(\"What is 2+2?\")\n\nprint(answer)\n# 4\n
"},{"location":"reference/samplers/#batching","title":"Batching","text":"You can ask the generator to take multiple samples by passing the number of samples when initializing the sampler:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.multinomial(3)\n\ngenerator = generate.text(model, sampler)\nanswer = generator(\"What is 2+2?\")\n\nprint(answer)\n# [4, 4, 4]\n
If you ask multiple samples for a batch of prompts the returned array will be of shape (num_samples, num_batches)
:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.multinomial(3)\n\ngenerator = generate.text(model, sampler)\nanswer = generator([\"What is 2+2?\", \"What is 3+3?\"])\n\nprint(answer)\n# [[4, 4, 4], [6, 6, 6]]\n
"},{"location":"reference/samplers/#temperature","title":"Temperature","text":"You can control the temperature with
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.multinomial(3, temperature=0.5)\n\ngenerator = generate.text(model, sampler)\nanswer = generator([\"What is 2+2?\", \"What is 3+3?\"])\n\nprint(answer)\n
If you would like to use temperature=0.0
, please use sampler=samplers.greedy()
instead.
"},{"location":"reference/samplers/#top-k-sampling","title":"Top-k sampling","text":"You can ask Outlines to only consider the top-k logits at each step by specifying the value of the top-k
keyword argument when initializing the sampler.
sampler = samplers.multinomial(3, top_k=10)\n
"},{"location":"reference/samplers/#top-p-sampling","title":"Top-p sampling","text":"You can ask Outlines to only consider the highest probability tokens such that their cumulative probability is greater than a threshold p
. Specify the top_p
keyword argument when initializing the sampler:
sampler = samplers.multinomial(3, top_p=0.95)\n
"},{"location":"reference/samplers/#greedy-sampler","title":"Greedy sampler","text":"Greedy sampling selects the token with the highest probability at each step. It's deterministic and always produces the same output for a given input.
To use the greedy sampler, initialize the generator with the sampler:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.greedy()\n\ngenerator = generate.text(model, sampler)\nanswer = generator(\"What is 2+2?\")\n\nprint(answer)\n# 4\n
You cannot ask for multiple samples with the greedy sampler since it does not clear what the result should be. Only the most likely token can be returned.
"},{"location":"reference/samplers/#beam-search","title":"Beam Search","text":"Beam search maintains multiple candidate sequences at each step, potentially finding better overall sequences than greedy or multinomial sampling.
To use Beam Search, initialize the generator with the sampler:
from outlines import models, generate, samplers\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\nsampler = samplers.beam_search(beams=5)\n\ngenerator = generate.text(model, sampler)\nanswer = generator(\"What is 2+2?\")\n\nprint(answer)\n# 4\n
Compatibility
Only models from the transformers
and exllamav2
libraries are compatible with Beam Search.
"},{"location":"reference/samplers/#parameters_1","title":"Parameters","text":" beams
: Number of beams to use (default: 1)
"},{"location":"reference/samplers/#sampler-comparison","title":"Sampler Comparison","text":"Here's a table comparing the different samplers:
Sampler Pros Cons Use Cases Greedy Deterministic, fast May produce repetitive text When you need consistent, predictable output Multinomial Balances exploration and exploitation Results may vary between runs General-purpose text generation, creative tasks Beam Search Can find globally better sequences More computationally expensive When sequence quality is critical, e.g., translation For most use cases, we recommend using the default multinomial sampler.
"},{"location":"reference/text/","title":"Text generation","text":"Outlines provides a unified interface to generate text with many language models, API-based and local. The same pattern is used throughout the library:
- Instantiate a generator by calling
outlines.generate.text
with the model to be used. - Call the generator with the prompt and (optionally) some generation parameters.
from outlines import models, generate\n\nmodel = models.openai(\"gpt-4o-mini\")\ngenerator = generate.text(model)\nanswer = generator(\"What is 2+2?\")\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\nanswer = generator(\"What is 2+2?\")\n
By default Outlines uses the multinomial sampler with temperature=1
. See this section to learn how to use different samplers.
"},{"location":"reference/text/#streaming","title":"Streaming","text":"Outlines allows you to stream the model's response by calling the .stream
method of the generator with the prompt:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\n\ntokens = generator.stream(\"What is 2+2?\")\nfor token in tokens:\n print(token)\n
"},{"location":"reference/text/#parameters","title":"Parameters","text":""},{"location":"reference/text/#limit-the-number-of-tokens-generated","title":"Limit the number of tokens generated","text":"To limit the number of tokens generated you can pass the max_tokens
positional argument to the generator:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\n\nanswer = generator(\"What is 2+2?\", 5)\nanswer = generator(\"What is 2+2?\", max_tokens=5)\n
"},{"location":"reference/text/#stop-after-a-given-string-is-generated","title":"Stop after a given string is generated","text":"You can also ask the model to stop generating text after a given string has been generated, for instance a period or a line break. You can pass a string or a line of string for the stop_at
argument:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\n\nanswer = generator(\"What is 2+2?\", stop_at=\".\")\nanswer = generator(\"What is 2+2?\", stop_at=[\".\", \"\\n\"])\n
The stopping string will be included in the response.
"},{"location":"reference/text/#seed-the-generation","title":"Seed the generation","text":"It can be useful to seed the generation in order to get reproducible results:
import torch\nfrom outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n\nseed = 789001\n\nanswer = generator(\"What is 2+2?\", seed=seed)\n
"},{"location":"reference/generation/cfg/","title":"Grammar-structured generation","text":"You can pass any context-free grammar in the EBNF format and Outlines will generate an output that is valid to this grammar:
from outlines import models, generate\n\narithmetic_grammar = \"\"\"\n ?start: expression\n\n ?expression: term ((\"+\" | \"-\") term)*\n\n ?term: factor ((\"*\" | \"/\") factor)*\n\n ?factor: NUMBER\n | \"-\" factor\n | \"(\" expression \")\"\n\n %import common.NUMBER\n\"\"\"\n\nmodel = models.transformers(\"WizardLM/WizardMath-7B-V1.1\")\ngenerator = generate.cfg(model, arithmetic_grammar)\nsequence = generator(\n \"Alice had 4 apples and Bob ate 2. \"\n + \"Write an expression for Alice's apples:\"\n)\n\nprint(sequence)\n# (8-2)\n
"},{"location":"reference/generation/cfg/#disclaimer","title":"Disclaimer","text":"Experimental
Outlines current community-contributed implementation of CFG-structured generation is experimental. This does not reflect the performance of .txt's product, where we have optimized grammar-structured generation to be as fast as regex-structured generation. Additionally, it does not fully align with the approach described in our technical report, aside from its use of incremental/partial parsing. This feature is still a work in progress, requiring performance enhancements and bug fixes for an ideal implementation. For more details, please see our grammar-related open issues on GitHub.
Greedy
To mitigate performance issues, CFG-structured generation will use rejection sampling and iterate over the candidate tokens highest logit first,, completing once a single valid token ID is selected. This is effectively greedy generation.
"},{"location":"reference/generation/cfg/#ready-to-use-grammars","title":"Ready-to-use grammars","text":"Outlines contains a (small) library of grammars that can be imported and use directly. We can rewrite the previous example as:
from outlines import models, generate\n\narithmetic_grammar = outlines.grammars.arithmetic\n\nmodel = models.transformers(\"WizardLM/WizardMath-7B-V1.1\")\ngenerator = generate.cfg(model, arithmetic_grammar)\nsequence = generator(\n \"Alice had 4 apples and Bob ate 2. \"\n + \"Write an expression for Alice's apples:\"\n)\n\nprint(sequence)\n# (8-2)\n
The following grammars are currently available:
- Arithmetic grammar via
outlines.grammars.arithmetic
- JSON grammar via
outlines.grammars.json
If you would like more grammars to be added to the repository, please open an issue or a pull request.
"},{"location":"reference/generation/cfg/#grammar-guide","title":"Grammar guide","text":"A grammar is a list of rules and terminals that define a language:
- Terminals define the vocabulary of the language; they may be a string, regular expression or combination of these and other terminals.
- Rules define the structure of that language; they are a list of terminals and rules.
Outlines uses the Lark library to make Large Language Models generate text in a language of a grammar, it thus uses grammars defined in a format that Lark understands, based on the EBNF syntax. Read the Lark documentation for more details on grammar, the following is a small primer that should help get your started.
In the following we will define a LOGO-like toy language for python's turtle library.
"},{"location":"reference/generation/cfg/#terminals","title":"Terminals","text":"A turtle can take 4 different MOVEMENT
move instructions: forward (f
), backward (b
), turn right (r
) and turn left (l
). It can take NUMBER
number of steps in each direction, and draw lines in a specified COLOR
. These define the vocabulary of our language:
MOVEMENT: \"f\"|\"b\"|\"r\"|\"l\"\nCOLOR: LETTER+\n\n%import common.LETTER\n%import common.INT -> NUMBER\n%import common.WS\n%ignore WS\n
The lines that start with %
are called \"directive\". They allow to import pre-defined terminals and rules, such as LETTER
and NUMBER
. LETTER+
is a regular expressions, and indicates that a COLOR
is made of at least one LETTER
. The last two lines specify that we will ignore white spaces (WS
) in the grammar.
"},{"location":"reference/generation/cfg/#rules","title":"Rules","text":"We now need to define our rules, by decomposing instructions we can send to the turtle via our python program. At each line of the program, we can either choose a direction and execute a given number of steps, change the color used to draw the pattern. We can also choose to start filling, make a series of moves, and stop filling. We can also choose to repeat a series of move.
We can easily write the first two rules:
instruction: MOVEMENT NUMBER -> movement\n | \"c\" COLOR [COLOR] -> change_color\n
where movement
and change_color
represent aliases for the rules. A whitespace implied concatenating the elements, and |
choosing either of the elements. The fill
and repeat
rules are slightly more complex, since they apply to a code block, which is made of instructions. We thus define a new code_block
rule that refers to instruction
and finish implementing our rules:
instruction: MOVEMENT NUMBER -> movement\n | \"c\" COLOR [COLOR] -> change_color\n | \"fill\" code_block -> fill\n | \"repeat\" NUMBER code_block -> repeat\n\ncode_block: \"{\" instruction \"}\"\n
We can now write the full grammar:
start: instruction+\n\ninstruction: MOVEMENT NUMBER -> movement\n | \"c\" COLOR [COLOR] -> change_color\n | \"fill\" code_block -> fill\n | \"repeat\" NUMBER code_block -> repeat\n\ncode_block: \"{\" instruction+ \"}\"\n\nMOVEMENT: \"f\"|\"b\"|\"l\"|\"r\"\nCOLOR: LETTER+\n\n%import common.LETTER\n%import common.INT -> NUMBER\n%import common.WS\n%ignore WS\n
Notice the start
rule, which defines the starting point of the grammar, i.e. the rule with which a program must start. This full grammars allows us to parse programs such as:
c red yellow\n fill { repeat 36 {\n f200 l170\n }}\n
The result of the parse, the parse tree, can then easily be translated into a Python program that uses the turtle
library to draw a pattern.
"},{"location":"reference/generation/cfg/#next-steps","title":"Next steps","text":"This section provides a very brief overview of grammars and their possibilities. Check out the Lark documentation for more thorough explanations and more examples.
"},{"location":"reference/generation/choices/","title":"Multiple choices","text":"Oultines allows you to make sure the generated text is chosen between different options:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.choice(model, [\"skirt\", \"dress\", \"pen\", \"jacket\"])\nanswer = generator(\"Pick the odd word out: skirt, dress, pen, jacket\")\n
Performance
generation.choice
computes an index that helps Outlines guide generation. This can take some time, but only needs to be done once. If you want to generate from the same list of choices several times make sure that you only call generate.choice
once.
"},{"location":"reference/generation/creating_grammars/","title":"Overview","text":"Outlines allows the use of Lark grammars to guide generation. These grammars are used to construct parsers that filter out incompatible tokens during the generation process The result is a generation that adheres to the grammar's production rules.
"},{"location":"reference/generation/creating_grammars/#primer-on-creating-grammars","title":"Primer on Creating Grammars","text":"To create grammars for Outlines, a solid understanding of Lark grammars is necessary. Here's how you can get started:
- Read Lark's grammars documentations here.
- Review Outlines' existing grammars here.
"},{"location":"reference/generation/creating_grammars/#compatibility-with-outlines","title":"Compatibility With Outlines","text":"It's important to note that not all Lark grammars work with Outlines. Changes may be necessary to ensure compatability.
"},{"location":"reference/generation/creating_grammars/#lalr1-parser","title":"LALR(1) Parser","text":"Outlines utilizes Larks LALR(1) parser, meaning the grammar must be unambiguous at least up to the next token (one token lookahead). Read Lark's official LALR(1) parser documentation here.
If your grammar is ambiguous, you will recieve the following error at runtime:
GrammarError: Reduce/Reduce collision in Terminal('B') between the following rules:\n
"},{"location":"reference/generation/creating_grammars/#regex-terminal-restrictions","title":"Regex Terminal Restrictions","text":"Outlines converts terminals to finite state machines using the Interegular library. Not all regular expressions work with Interegular, mitigation is described in the subsections which follow.
"},{"location":"reference/generation/creating_grammars/#avoid-lookarounds","title":"Avoid Lookarounds","text":"Examples of removing lookaround while maintaining the same functionality
"},{"location":"reference/generation/creating_grammars/#example-escaped-string","title":"Example: Escaped String","text":"From Outlines' modified ESCAPED_STRING
in common.lark.
Before:
_STRING_INNER: /.*?/\n_STRING_ESC_INNER: _STRING_INNER /(?<!\\\\)(\\\\\\\\)*?/\n\nESCAPED_STRING : \"\\\"\" _STRING_ESC_INNER \"\\\"\"\n
After:
_NON_CONTROL_CHAR: /([^\"\\\\\\x00-\\x1F\\x7F-\\x9F])/\n_ESCAPED_CHAR: /\\\\/ (_NON_CONTROL_CHAR | /\\\\/ | /\"/)\nESCAPED_STRING_INNER: _NON_CONTROL_CHAR | _ESCAPED_CHAR\nESCAPED_STRING: /\"/ ESCAPED_STRING_INNER* /\"/\n
"},{"location":"reference/generation/creating_grammars/#avoid-backreferences","title":"Avoid Backreferences","text":"Backreferences, for example ([ab]^*)\\1
, cannot be simulated by a finite state machine, and will result in an error if used.
"},{"location":"reference/generation/creating_grammars/#creating-a-valid-grammar","title":"Creating a Valid Grammar","text":"You can use Outlines' test suite to verify your grammar.
"},{"location":"reference/generation/creating_grammars/#1-create-your-grammar","title":"1) Create Your Grammar","text":"Create your grammar file named your_new_grammar.lark
, adhering to the guidelines provided above. Add it to outlines/grammars/
(ensure attribution is included and license is compatible).
Update outlines/grammars.py
with a line including your grammar.
"},{"location":"reference/generation/creating_grammars/#2-test-your-grammar","title":"2) Test Your Grammar","text":"Test grammar for false negatives, ensure sample grammars can be generated: - Add valid example outputs which are compliant with the grammar to tests/benchmark/cfg_samples/your_new_grammar/
- Run the tests for your grammar via pytest -s tests/fsm/test_cfg_guide.py::test_cfg_grammar_sample -k \"your_new_grammar\"
Test grammar for false positives, ensure invalid outputs aren't generated.
Currently there isn't a builtin false positive testing utility. It is recommended you smoke test via
from outlines import models, generate, grammars\nmodel = models.transformers(\"mistralai/Mistral-7B-v0.1\")\ngenerator = generate.cfg(model, grammars.your_new_grammar)\nresult = generator(<your prompt to generate output for your grammar>)\nprint(result)\n
"},{"location":"reference/generation/creating_grammars/#converting","title":"Converting","text":"There are a few tools available for converting from other grammars to lark. These tools serve as a starting point. However, you will typically need to make additional adjustments to ensure full compatibility and proper functioning within Outlines.
Tools: - Larks built in \"Nearley-to-Lark\" converter https://lark-parser.readthedocs.io/en/latest/tools.html - Convert ANTLR4 to Lark (Note, most antlr4 grammars are not LALR(1) compatible, so will require additional tweaking) https://github.com/kaby76/Domemtech.Trash/blob/main/src/trconvert/readme.md - Extract EBNF from Yacc files https://www.bottlecaps.de/rr/ui
Reference Grammars: - Github Lark Grammars https://github.com/search?q=path%3A.lark&type=code - Github Nearley Grammars https://github.com/search?q=path%3A.ne+%22-%3E%22&type=code - Antlr4 grammars https://github.com/antlr/grammars-v4/ - Grammar zoo https://slebok.github.io/zoo/index.html#html
"},{"location":"reference/generation/custom_fsm_ops/","title":"Custom FSM Operations","text":"Outlines is fast because it compiles regular expressions into an index ahead of inference. To do so we use the equivalence between regular expressions and Finite State Machines (FSMs), and the library interegular to perform the translation.
Alternatively, one can pass a FSM built using integular
directly to structure the generation.
"},{"location":"reference/generation/custom_fsm_ops/#example","title":"Example","text":""},{"location":"reference/generation/custom_fsm_ops/#using-the-difference-operation","title":"Using the difference
operation","text":"In the following example we build a fsm which recognizes only the strings valid to the first regular expression but not the second. In particular, it will prevent the words \"pink\" and \"elephant\" from being generated:
import interegular\nfrom outlines import models, generate\n\n\nlist_of_strings_pattern = \"\"\"\\[\"[^\"\\s]*\"(?:,\"[^\"\\s]*\")*\\]\"\"\"\npink_elephant_pattern = \"\"\".*(pink|elephant).*\"\"\"\n\nlist_of_strings_fsm = interegular.parse_pattern(list_of_strings_pattern).to_fsm()\npink_elephant_fsm = interegular.parse_pattern(pink_elephant_pattern).to_fsm()\n\ndifference_fsm = list_of_strings_fsm - pink_elephant_fsm\n\ndifference_fsm_fsm.accepts('[\"a\",\"pink\",\"elephant\"]')\n# False\ndifference_fsm_fsm.accepts('[\"a\",\"blue\",\"donkey\"]')\n# True\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.fsm(model, difference_fsm)\nresponse = generator(\"Don't talk about pink elephants\")\n
To see the other operations available, consult interegular's documentation.
"},{"location":"reference/generation/format/","title":"Type constraints","text":"We can ask completions to be restricted to valid python types:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.format(model, int)\nanswer = generator(\"When I was 6 my sister was half my age. Now I\u2019m 70 how old is my sister?\")\nprint(answer)\n# 67\n
The following types are currently available:
- int
- float
- bool
- datetime.date
- datetime.time
- datetime.datetime
- We also provide custom types
"},{"location":"reference/generation/generation/","title":"Generation","text":"Once an Outlines model is constructed you can use outlines.generate
to generate text. Standard LLM generation is possible via outlines.generate.text
, along with a variety of structured generation methods described below. (For a detailed technical explanation of how structured generation works, you may review the Structured Generation Explanation page)
Before generating text, you must construct an outlines.model
. Example:
import outlines\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-4k-instruct\", device=\"cuda\")\n
"},{"location":"reference/generation/generation/#text-generator","title":"Text generator","text":"generator = outlines.generate.text(model)\n\nresult = generator(\"Question: What's 2+2? Answer:\", max_tokens=100)\nprint(result)\n# The answer is 4\n\n# Outlines also supports streaming output\nstream = generator.stream(\"What's 2+2?\", max_tokens=4)\nfor i in range(5):\n token = next(stream)\n print(repr(token))\n# '2'\n# '+'\n# '2'\n# ' equals'\n# '4'\n
"},{"location":"reference/generation/generation/#multi-label-classification","title":"Multi-label classification","text":"Outlines allows you to do multi-label classification by guiding the model so it can only output either of the specified choices:
import outlines\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = outlines.generate.choice(model, [\"Blue\", \"Red\", \"Yellow\"])\n\ncolor = generator(\"What is the closest color to Indigo? \")\nprint(color)\n# Blue\n
"},{"location":"reference/generation/generation/#json-structured-generation","title":"JSON-structured generation","text":"Outlines can guide models so that they output valid JSON 100% of the time. You can either specify the structure using Pydantic or a string that contains a JSON Schema:
PydanticJSON Schema from enum import Enum\nfrom pydantic import BaseModel, constr, conint\n\nimport outlines\n\nclass Armor(str, Enum):\n leather = \"leather\"\n chainmail = \"chainmail\"\n plate = \"plate\"\n\n\nclass Character(BaseModel):\n name: constr(max_length=10)\n age: conint(gt=18, lt=99)\n armor: Armor\n strength: conint(gt=1, lt=100)\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = outlines.generate.json(model, Character)\n\ncharacter = generator(\n \"Generate a new character for my awesome game: \"\n + \"name, age (between 1 and 99), armor and strength. \"\n )\nprint(character)\n# name='Orla' age=21 armor=<Armor.plate: 'plate'> strength=8\n
import outlines\n\nschema = \"\"\"{\n \"$defs\": {\n \"Armor\": {\n \"enum\": [\"leather\", \"chainmail\", \"plate\"],\n \"title\": \"Armor\",\n \"type\": \"string\"\n }\n },\n \"properties\": {\n \"name\": {\"maxLength\": 10, \"title\": \"Name\", \"type\": \"string\"},\n \"age\": {\"title\": \"Age\", \"type\": \"integer\"},\n \"armor\": {\"$ref\": \"#/$defs/Armor\"},\n \"strength\": {\"title\": \"Strength\", \"type\": \"integer\"}\\\n },\n \"required\": [\"name\", \"age\", \"armor\", \"strength\"],\n \"title\": \"Character\",\n \"type\": \"object\"\n}\"\"\"\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = outlines.generate.json(model, schema)\ncharacter = generator(\n \"Generate a new character for my awesome game: \"\n + \"name, age (between 1 and 99), armor and strength. \"\n )\nprint(character)\n# {'name': 'Yuki', 'age': 24, 'armor': 'plate', 'strength': 3}\n
Note
We advise you to constrain the length of the strings fields when first testing your schema, especially with small models.
"},{"location":"reference/generation/generation/#grammar-structured-generation","title":"Grammar-structured generation","text":"Outlines also allows to generate text that is valid to any context-free grammar (CFG) in the EBNF format. Grammars can be intimidating, but they are a very powerful tool! Indeed, they determine the syntax of every programming language, valid chess moves, molecule structure, can help with procedural graphics generation, etc.
Here we show a simple example of a grammar that defines arithmetic operations:
from outlines import models, generate\n\narithmetic_grammar = \"\"\"\n ?start: sum\n\n ?sum: product\n | sum \"+\" product -> add\n | sum \"-\" product -> sub\n\n ?product: atom\n | product \"*\" atom -> mul\n | product \"/\" atom -> div\n\n ?atom: NUMBER -> number\n | \"-\" atom -> neg\n | \"(\" sum \")\"\n\n %import common.NUMBER\n %import common.WS_INLINE\n\n %ignore WS_INLINE\n\"\"\"\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = generate.cfg(model, arithmetic_grammar, max_tokens=100)\n\nresult = generator(\"Question: How can you write 5*5 using addition?\\nAnswer:\")\nprint(result)\n# 5+5+5+5+5\n
EBNF grammars can be cumbersome to write. This is why Outlines provides grammar definitions in the outlines.grammars.
module
from outlines import models, generate, grammars\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = generate.cfg(model, grammars.arithmetic, max_tokens=100)\n\nresult = generator(\"Question: How can you write 5*5 using addition?\\nAnswer:\")\nprint(result)\n# 5+5+5+5+5\n
The available grammars are listed here.
"},{"location":"reference/generation/generation/#regex-structured-generation","title":"Regex-structured generation","text":"Slightly simpler, but no less useful, Outlines can generate text that is in the language of a regular expression. For instance to force the model to generate IP addresses:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\n\nregex_str = r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\"\ngenerator = generate.regex(model, regex_str)\n\nresult = generator(\"What is the IP address of localhost?\\nIP: \")\nprint(result)\n# 127.0.0.100\n
"},{"location":"reference/generation/generation/#generate-a-given-python-type","title":"Generate a given Python type","text":"We provide a shortcut to regex-structured generation for simple use cases. Pass a Python type to the outlines.generate.format
function and the LLM will output text that matches this type:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\ngenerator = generate.format(model, int)\n\nresult = generator(\"What is 2+2?\")\nprint(result)\n# 4\n
"},{"location":"reference/generation/json/","title":"JSON structured generation","text":"Outlines can make any open source model return a JSON object that follows a structure that is specified by the user. This is useful whenever we want the output of the model to be processed by code downstream: code does not understand natural language but rather the structured language it has been programmed to understand.
There are mostly two reasons why someone would want to get an output formatted as JSON from a LLM:
- Parse the answer (e.g. with Pydantic), store it somewhere, return it to a user, etc.
- Call a function with the result
Outlines has you covered in both cases! Indeed, to define the structure of the JSON you want the model to follow you can either provide a Pydantic model, or a function. No need to duplicate code!
"},{"location":"reference/generation/json/#using-pydantic","title":"Using Pydantic","text":"Outlines can infer the structure of the output from a Pydantic model. The result is an instance of the model that contains the values returned by the LLM:
from pydantic import BaseModel\n\nfrom outlines import models, generate\n\n\nclass User(BaseModel):\n name: str\n last_name: str\n id: int\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.json(model, User)\nresult = generator(\n \"Create a user profile with the fields name, last_name and id\"\n)\nprint(result)\n# User(name=\"John\", last_name=\"Doe\", id=11)\n
JSON and whitespaces
By default Outlines prevents the model from generating json with syntactic newlines, tabs, or multiple spaces. The default whitespace_pattern
is r\"[ ]?\"
. Small models tend to enter an infinite repetition loop if the whitespace_pattern
allows infinite spacing. If you would like to allow the model to generate multiple tabs, newlines, and spaces, you can set the whitespace pattern as follows:
generator = generate.json(model, User, whitespace_pattern=r\"[\\n\\t ]*\")\n
Performance
generation.json
computes an index that helps Outlines guide generation. This can take some time, but only needs to be done once. If you want to generate several times with the same schema make sure that you only call generate.json
once.
Custom types
Outlines provides custom Pydantic types so you do not have to write regular expressions for common types, such as phone numbers or zip codes.
"},{"location":"reference/generation/json/#using-a-json-schema","title":"Using a JSON Schema","text":"Instead of a Pydantic model you can pass a string that represents a JSON Schema specification to generate.json
:
from pydantic import BaseModel\n\nfrom outlines import models\nfrom outlines import generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n\nschema = \"\"\"\n{\n \"title\": \"User\",\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\"type\": \"string\"},\n \"last_name\": {\"type\": \"string\"},\n \"id\": {\"type\": \"integer\"}\n },\n \"required\": [\"name\", \"last_name\", \"id\"]\n}\n\"\"\"\n\ngenerator = generate.json(model, schema)\nresult = generator(\n \"Create a user profile with the fields name, last_name and id\"\n)\nprint(result)\n# User(name=\"John\", last_name=\"Doe\", id=11)\n
"},{"location":"reference/generation/json/#from-a-functions-signature","title":"From a function's signature","text":"Outlines can infer the structure of the output from the signature of a function. The result is a dictionary, and can be passed directly to the function using the usual dictionary expansion syntax **
:
from outlines import models\nfrom outlines import generate\n\ndef add(a: int, b: int):\n return a + b\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.json(model, add)\nresult = generator(\"Return two integers named a and b respectively. a is odd and b even.\")\n\nprint(add(**result))\n# 3\n
A great advantage of passing functions directly to specify the structure is that the structure of the LLM will change with the function's definition. No need to change the code at several places!
"},{"location":"reference/generation/regex/","title":"Regular expressions","text":"Outlines can guarantee that the text generated by the LLM will be valid to a regular expression:
from outlines import models, generate\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\n\ngenerator = generate.regex(\n model,\n r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\",\n)\n\nprompt = \"What is the IP address of the Google DNS servers? \"\nanswer = generator(prompt, max_tokens=30)\n\nprint(answer)\n# What is the IP address of the Google DNS servers?\n# 2.2.6.1\n
If you find yourself using generate.regex
to restrict the answers' type you can take a look at type-structured generation instead.
Performance
generate.regex
computes an index that helps Outlines guide generation. This can take some time, but only needs to be done once. If you want to generate several times using the same regular expression make sure that you only call generate.regex
once.
"},{"location":"reference/generation/structured_generation_explanation/","title":"How does Outlines work?","text":"Language models generate text token by token, using the previous token sequence as input and sampled logits as output. This document explains the structured generation process, where only legal tokens are considered for the next step based on a predefined automata, e.g. a regex-defined finite-state machine (FSM) or Lark grammar.`
"},{"location":"reference/generation/structured_generation_explanation/#worked-example","title":"Worked Example","text":"Let's consider a worked example with a pattern for whole and decimal numbers:
^\\d*(\\.\\d+)?$
.
"},{"location":"reference/generation/structured_generation_explanation/#creating-automata","title":"Creating Automata","text":"The pattern is first converted into an automata. Below is a brief explanation of the automata conversion and its representation.
Automata Diagram:
graph LR\n node0(\"1-9\") --> node1(\"1-9\")\n node1 --> node1\n node1 --> nodeEND{{END}}\n node1 --> nodePeriod(\".\")\n nodePeriod --> node2(\"1-9\")\n node2 --> node2\n node2 --> nodeEND{{END}}
"},{"location":"reference/generation/structured_generation_explanation/#generating-a-token","title":"Generating a Token","text":"Let's assume that we're in the middle of generation, and so far \"748\" has been generated. Here is the automata with the current state highlighted in green, with the legal next characters being another number (1-9), a dot (.), or end of sequence.
graph LR\n node0(\"1-9\") --> node1(\"1-9\")\n node1 --> node1\n node1 --> nodeEND{{END}}\n node1 --> nodePeriod(\".\")\n nodePeriod --> node2(\"1-9\")\n node2 --> node2\n node2 --> nodeEND{{END}}\n\n style node1 fill:#090
Generating a token requires the following steps:
- Feed the previous input sequence (\"748\") into the language model.
- Language model runs a forward pass and produces token logits.
- Outlines logits processor sets the probability of illegal tokens to 0%.
- A token is sampled from the set of legal tokens.
"},{"location":"reference/generation/types/","title":"Custom types","text":"Outlines provides custom Pydantic types so you can focus on your use case rather than on writing regular expressions:
Category Type Import Description ISBN 10 & 13 outlines.types.ISBN
There is no guarantee that the check digit will be correct Airport IATA outlines.types.airports.IATA
Valid airport IATA codes Country alpha-2 code outlines.types.airports.Alpha2
Valid country alpha-2 codes alpha-3 code outlines.types.countries.Alpha3
Valid country alpha-3 codes numeric code outlines.types.countries.Numeric
Valid country numeric codes name outlines.types.countries.Name
Valid country names flag outlines.types.countries.Flag
Valid flag emojis email outlines.types.Email
Valid email address Some types require localization. We currently only support US types, but please don't hesitate to create localized versions of the different types and open a Pull Request. Localized types are specified using types.locale
in the following way:
from outlines import types\n\ntypes.locale(\"us\").ZipCode\ntypes.locale(\"us\").PhoneNumber\n
Here are the localized types that are currently available:
Category Locale Import Description Zip code US ZipCode
Generate US Zip(+4) codes Phone number US PhoneNumber
Generate valid US phone numbers You can use these types in Pydantic schemas for JSON-structured generation:
from pydantic import BaseModel\n\nfrom outlines import models, generate, types\n\n# Specify the locale for types\nlocale = types.locale(\"us\")\n\nclass Client(BaseModel):\n name: str\n phone_number: locale.PhoneNumber\n zip_code: locale.ZipCode\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.json(model, Client)\nresult = generator(\n \"Create a client profile with the fields name, phone_number and zip_code\"\n)\nprint(result)\n# name='Tommy' phone_number='129-896-5501' zip_code='50766'\n
Or simply with outlines.generate.format
:
from pydantic import BaseModel\n\nfrom outlines import models, generate, types\n\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.format(model, types.locale(\"us\").PhoneNumber)\nresult = generator(\n \"Return a US Phone number: \"\n)\nprint(result)\n# 334-253-2630\n
We plan on adding many more custom types. If you have found yourself writing regular expressions to generate fields of a given type, or if you could benefit from more specific types don't hesite to submit a PR or open an issue.
"},{"location":"reference/models/exllamav2/","title":"ExllamaV2","text":"The outlines.models.exllamav2
model requires a Logits Processor component for compatibility with Outlines structured generation. While ExLlamaV2 doesn't natively support this feature, a third-party fork provides the necessary functionality. You can install it with the following command:
pip install git+https://github.com/lapp0/exllamav2@sampler-logits-processor\n
Install other requirements:
pip install transformers torch\n
Coming soon
"},{"location":"reference/models/llamacpp/","title":"Llama.cpp","text":"Outlines provides an integration with Llama.cpp using the llama-cpp-python library. Llamacpp allows to run quantized models on machines with limited compute.
Installation
You need to install the llama-cpp-python
library to use the llama.cpp integration. See the installation section for instructions to install llama-cpp-python
with CUDA, Metal, ROCm and other backends. To get started quickly you can also run:
pip install \"outlines[llamacpp]\"\n
"},{"location":"reference/models/llamacpp/#load-the-model","title":"Load the model","text":"You can initialize the model by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
from outlines import models\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\n
This will download the model files to the hub cache folder and load the weights in memory.
You can also initialize the model by passing the path to the weights on your machine. Assuming Phi2's weights are in the current directory:
from outlines import models\nfrom llama_cpp import Llama\n\nllm = Llama(\"./phi-2.Q4_K_M.gguf\")\nmodel = models.LlamaCpp(llm)\n
If you need more control, you can pass the same keyword arguments to the model as you would pass in the llama-ccp-library:
from outlines import models\n\nmodel = models.llamacpp(\n \"TheBloke/phi-2-GGUF\",\n \"phi-2.Q4_K_M.gguf\"\n n_ctx=512, # to set the context length value\n)\n
Main parameters:
Parameters Type Description Default n_gpu_layers
int
Number of layers to offload to GPU. If -1, all layers are offloaded 0
split_mode
int
How to split the model across GPUs. 1
for layer-wise split, 2
for row-wise split 1
main_gpu
int
Main GPU 0
tensor_split
Optional[List[float]]
How split tensors should be distributed across GPUs. If None
the model is not split. None
n_ctx
int
Text context. Inference from the model if set to 0
0
n_threads
Optional[int]
Number of threads to use for generation. All available threads if set to None
. None
verbose
bool
Print verbose outputs to stderr
False
See the llama-cpp-python documentation for the full list of parameters.
"},{"location":"reference/models/llamacpp/#load-the-model-on-gpu","title":"Load the model on GPU","text":"Note
Make sure that you installed llama-cpp-python
with GPU support.
To load the model on GPU, pass n_gpu_layers=-1
:
from outlines import models\n\nmodel = models.llamacpp(\n \"TheBloke/phi-2-GGUF\",\n \"phi-2.Q4_K_M.gguf\",\n n_gpu_layers=-1, # to use GPU acceleration\n)\n
This also works with generators built with generate.regex
, generate.json
, generate.cfg
, generate.format
and generate.choice
.
"},{"location":"reference/models/llamacpp/#load-lora-adapters","title":"Load LoRA adapters","text":"You can load LoRA adapters dynamically:
from outlines import models, generate\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\ngenerator = generate.text(model)\nanswer_1 = generator(\"prompt\")\n\nmodel.load_lora(\"./path/to/adapter.gguf\")\nanswer_2 = generator(\"prompt\")\n
To load another adapter you need to re-initialize the model. Otherwise the adapter will be added on top of the previous one:
from outlines import models\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\nmodel.load_lora(\"./path/to/adapter1.gguf\") # Load first adapter\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\nmodel.load_lora(\"./path/to/adapter2.gguf\") # Load second adapter\n
"},{"location":"reference/models/llamacpp/#generate-text","title":"Generate text","text":"In addition to the parameters described in the text generation section you can pass extra keyword arguments, for instance to set sampling parameters not exposed in Outlines' public API:
from outlines import models, generate\n\n\nmodel = models.llamacpp(\"TheBloke/phi-2-GGUF\", \"phi-2.Q4_K_M.gguf\")\ngenerator = generate.text(model)\n\nanswer = generator(\"A prompt\", presence_penalty=0.8)\n
Extra keyword arguments:
The value of the keyword arguments you pass to the generator suspersede the values set when initializing the sampler or generator. All extra sampling methods and repetition penalties are disabled by default.
Parameters Type Description Default suffix
Optional[str]
A suffix to append to the generated text. If None
no suffix is added. None
echo
bool
Whether to preprend the prompt to the completion. False
seed
int
The random seed to use for sampling. None
max_tokens
Optional[int]
The maximum number of tokens to generate. If None
the maximum number of tokens depends on n_ctx
. 16
frequence_penalty
float
The penalty to apply to tokens based on their frequency in the past 64 tokens. 0.0
presence_penalty
float
The penalty to apply to tokens based on their presence in the past 64 tokens. 0.0
repeat_penalty
float
The penalty to apply to repeated tokens in the past 64 tokens. 1.
stopping_criteria
Optional[StoppingCriteriaList]
A list of stopping criteria to use. None
logits_processor
Optional[LogitsProcessorList]
A list of logits processors to use. The logits processor used for structured generation will be added to this list. None
temperature
float
The temperature to use for sampling 1.0
top_p
float
The top-p value to use for nucleus sampling. 1.
min_p
float
The min-p value to use for minimum-p sampling. 0.
typical_p
float
The p value to use for locally typical sampling. 1.0
stop
Optional[Union[str, List[str]]]
A list of strings that stop generation when encountered. []
top_k
int
The top-k value used for top-k sampling. Negative value to consider all logit values. -1.
tfs_z
float
The tail-free sampling parameter. 1.0
mirostat_mode
int
The mirostat sampling mode. 0
mirostat_tau
float
The target cross-entropy for mirostat sampling. 5.0
mirostat_eta
float
The learning rate used to update mu
in mirostat sampling. 0.1
See the llama-cpp-python documentation for the full and up-to-date list of parameters and the llama.cpp code for the default values of other sampling parameters.
"},{"location":"reference/models/llamacpp/#streaming","title":"Streaming","text":""},{"location":"reference/models/llamacpp/#installation","title":"Installation","text":"You need to install the llama-cpp-python
library to use the llama.cpp integration.
"},{"location":"reference/models/llamacpp/#cpu","title":"CPU","text":"For a CPU-only installation run:
pip install llama-cpp-python\n
Warning
Do not run this command if you want support for BLAS, Metal or CUDA. Follow the instructions below instead.
"},{"location":"reference/models/llamacpp/#cuda","title":"CUDA","text":"CMAKE_ARGS=\"-DLLAMA_CUDA=on\" pip install llama-cpp-python\n
It is also possible to install pre-built wheels with CUDA support (Python 3.10 and above):
pip install llama-cpp-python \\\n --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>\n
Where <cuda-version>
is one of the following, depending on the version of CUDA installed on your system:
cu121
for CUDA 12.1 cu122
for CUDA 12.2 cu123
CUDA 12.3
"},{"location":"reference/models/llamacpp/#metal","title":"Metal","text":"CMAKE_ARGS=\"-DLLAMA_METAL=on\" pip install llama-cpp-python\n
It is also possible to install pre-build wheels with Metal support (Python 3.10 or above, MacOS 11.0 and above):
pip install llama-cpp-python \\\n --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal\n
"},{"location":"reference/models/llamacpp/#openblas","title":"OpenBLAS","text":"CMAKE_ARGS=\"-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS\" pip install llama-cpp-python\n
"},{"location":"reference/models/llamacpp/#other-backend","title":"Other backend","text":"llama.cpp
supports many other backends. Refer to the llama.cpp documentation to use the following backends:
- CLBast (OpenCL)
- hipBLAS (ROCm)
- Vulkan
- Kompute
- SYCL
"},{"location":"reference/models/mlxlm/","title":"mlx-lm","text":"Outlines provides an integration with mlx-lm, allowing models to be run quickly on Apple Silicon via the mlx library.
Installation
You need to install the mlx
and mlx-lm
libraries on a device which supports Metal to use the mlx-lm integration. To get started quickly you can also run:
pip install \"outlines[mlxlm]\"\n
"},{"location":"reference/models/mlxlm/#load-the-model","title":"Load the model","text":"You can initialize the model by passing the name of the repository on the HuggingFace Hub. The official repository for mlx-lm supported models is mlx-community.
from outlines import models\n\nmodel = models.mlxlm(\"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\")\n
This will download the model files to the hub cache folder and load the weights in memory.
The arguments model_config
and tokenizer_config
are available to modify loading behavior. For example, per the mlx-lm
documentation, you must set an eos_token for qwen/Qwen-7B
. In outlines you may do so via
model = models.mlxlm(\n \"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\",\n tokenizer_config={\"eos_token\": \"<|endoftext|>\", \"trust_remote_code\": True},\n)\n
Main parameters:
(Subject to change. Table based on mlx-lm.load docstring)
Parameters Type Description Default tokenizer_config
dict
Configuration parameters specifically for the tokenizer. Defaults to an empty dictionary. {}
model_config
dict
Configuration parameters specifically for the model. Defaults to an empty dictionary. {}
adapter_path
str
Path to the LoRA adapters. If provided, applies LoRA layers to the model. None
lazy
bool
If False, evaluate the model parameters to make sure they are loaded in memory before returning. False
"},{"location":"reference/models/mlxlm/#generate-text","title":"Generate text","text":"You may generate text using the parameters described in the text generation documentation.
With the loaded model, you can generate text or perform structured generation, e.g.
from outlines import models, generate\n\nmodel = models.mlxlm(\"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\")\ngenerator = generate.text(model)\n\nanswer = generator(\"A prompt\", temperature=2.0)\n
"},{"location":"reference/models/mlxlm/#streaming","title":"Streaming","text":"You may creating a streaming iterable with minimal changes
from outlines import models, generate\n\nmodel = models.mlxlm(\"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\")\ngenerator = generate.text(model)\n\nfor token_str in generator.text(\"A prompt\", temperature=2.0):\n print(token_str)\n
"},{"location":"reference/models/mlxlm/#structured","title":"Structured","text":"You may perform structured generation with mlxlm to guarantee your output will match a regex pattern, json schema, or lark grammar.
Example: Phone number generation with pattern \"\\\\+?[1-9][0-9]{7,14}\"
:
from outlines import models, generate\n\nmodel = models.mlxlm(\"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit\")\n\nphone_number_pattern = \"\\\\+?[1-9][0-9]{7,14}\"\ngenerator = generate.regex(model, phone_number_pattern)\n\nmodel_output = generator(\"What's Jennys Number?\\n\")\nprint(model_output)\n# '8675309'\n
"},{"location":"reference/models/models/","title":"Models","text":"Outlines supports generation using a number of inference engines (outlines.models
). Loading a model using outlines follows a similar interface between inference engines:
import outlines\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-128k-instruct\")\nmodel = outlines.models.transformers_vision(\"llava-hf/llava-v1.6-mistral-7b-hf\")\nmodel = outlines.models.vllm(\"microsoft/Phi-3-mini-128k-instruct\")\nmodel = outlines.models.llamacpp(\n \"microsoft/Phi-3-mini-4k-instruct-gguf\", \"Phi-3-mini-4k-instruct-q4.gguf\"\n)\nmodel = outlines.models.exllamav2(\"bartowski/Phi-3-mini-128k-instruct-exl2\")\nmodel = outlines.models.mlxlm(\"mlx-community/Phi-3-mini-4k-instruct-4bit\")\n\nmodel = outlines.models.openai(\n \"gpt-4o-mini\",\n api_key=os.environ[\"OPENAI_API_KEY\"]\n)\n
"},{"location":"reference/models/models/#feature-matrix","title":"Feature Matrix","text":"Transformers Transformers Vision vLLM llama.cpp ExLlamaV2 MLXLM OpenAI* Device Cuda \u2705 \u2705 \u2705 \u2705 \u2705 \u274c N/A Apple Silicon \u2705 \u2705 \u274c \u2705 \u2705 \u2705 N/A x86 / AMD64 \u2705 \u2705 \u274c \u2705 \u2705 \u274c N/A Sampling Greedy \u2705 \u2705 \u2705 \u2705* \u2705 \u2705 \u274c Multinomial \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 Multiple Samples \u2705 \u2705 \u274c \u274c \u2705 Beam Search \u2705 \u2705 \u2705 \u274c \u2705 \u274c \u274c Generation Batch \u2705 \u2705 \u2705 \u274c ? \u274c \u274c Stream \u2705 \u274c \u274c \u2705 ? \u2705 \u274c outlines.generate
Text \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 Structured \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 JSON Schema \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 Choice \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 Regex \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u274c Grammar \u2705 \u2705 \u2705 \u2705 \u2705 \u2705 \u274c"},{"location":"reference/models/models/#caveats","title":"Caveats","text":" - OpenAI doesn't support structured generation due to limitations in their API and server implementation.
outlines.generate
\"Structured\" includes methods such as outlines.generate.regex
, outlines.generate.json
, outlines.generate.cfg
, etc. - MLXLM only supports Apple Silicon.
- llama.cpp greedy sampling available via multinomial with
temperature = 0.0
.
"},{"location":"reference/models/openai/","title":"OpenAI and compatible APIs","text":"Installation
You need to install the openai
library to be able to use the OpenAI API in Outlines. Or alternatively:
pip install \"outlines[openai]\"\n
"},{"location":"reference/models/openai/#openai-models","title":"OpenAI models","text":"Outlines supports models available via the OpenAI Chat API, e.g. GPT-4o, ChatGPT and GPT-4. You can initialize the model by passing the model name to outlines.models.openai
:
from outlines import models\n\n\nmodel = models.openai(\"gpt-4o-mini\")\nmodel = models.openai(\"gpt-4o\")\n
Check the OpenAI documentation for an up-to-date list of available models. You can pass any parameter you would pass to openai.AsyncOpenAI
as keyword arguments:
import os\nfrom outlines import models\n\n\nmodel = models.openai(\n \"gpt-4o-mini\",\n api_key=os.environ[\"OPENAI_API_KEY\"]\n)\n
The following table enumerates the possible parameters. Refer to the OpenAI SDK's code for an up-to-date list.
Parameters:
Parameters Type Description Default api_key
str
OpenAI API key. Infered from OPENAI_API_KEY
if not specified None
organization
str
OpenAI organization id. Infered from OPENAI_ORG_ID
if not specified None
project
str
OpenAI project id. Infered from OPENAI_PROJECT_ID
if not specified. None
base_url
str | https.URL
Base URL for the endpoint. Infered from OPENAI_BASE_URL
if no specified. None
timeout
float
Request timeout. NOT_GIVEN
max_retries
int
Maximum number of retries for failing requests 2
default_headers
Mapping[str, str]
Default HTTP headers None
default_query
Mapping[str, str]
Custom parameters added to the HTTP queries None
http_client
https.AsyncClient
User-specified httpx
client None
"},{"location":"reference/models/openai/#azure-openai-models","title":"Azure OpenAI models","text":"Outlines also supports Azure OpenAI models:
from outlines import models\n\n\nmodel = models.azure_openai(\n \"azure-deployment-name\",\n \"gpt-4o-mini\",\n api_version=\"2024-07-18\",\n azure_endpoint=\"https://example-endpoint.openai.azure.com\",\n)\n
Why do I need to specify model and deployment name?
The model name is needed to load the correct tokenizer for the model. The tokenizer is necessary for structured generation.
You can pass any parameter you would pass to openai.AsyncAzureOpenAI
. You can consult the OpenAI SDK's code for an up-to-date list.
Parameters:
Parameters Type Description Default azure_endpoint
str
Azure endpoint, including the resource. Infered from AZURE_OPENAI_ENDPOINT
if not specified None
api_version
str
API version. Infered from AZURE_OPENAI_API_KEY
if not specified None
api_key
str
OpenAI API key. Infered from OPENAI_API_KEY
if not specified None
azure_ad_token
str
Azure active directory token. Inference from AZURE_OPENAI_AD_TOKEN
if not specified None
azure_ad_token_provider
AzureADTokenProvider
A function that returns an Azure Active Directory token None
organization
str
OpenAI organization id. Infered from OPENAI_ORG_ID
if not specified None
project
str
OpenAI project id. Infered from OPENAI_PROJECT_ID
if not specified. None
base_url
str | https.URL
Base URL for the endpoint. Infered from OPENAI_BASE_URL
if not specified. None
timeout
float
Request timeout. NOT_GIVEN
max_retries
int
Maximum number of retries for failing requests 2
default_headers
Mapping[str, str]
Default HTTP headers None
default_query
Mapping[str, str]
Custom parameters added to the HTTP queries None
http_client
https.AsyncClient
User-specified httpx
client None
"},{"location":"reference/models/openai/#models-that-follow-the-openai-standard","title":"Models that follow the OpenAI standard","text":"Outlines supports models that follow the OpenAI standard. You will need to initialize the OpenAI client properly configured and pass it to outlines.models.openai
import os\nfrom openai import AsyncOpenAI\nfrom outlines import models\nfrom outlines.models.openai import OpenAIConfig\n\n\nclient = AsyncOpenAI(\n api_key=os.environ.get(\"PROVIDER_KEY\"),\n base_url=\"http://other.provider.server.com\"\n)\nconfig = OpenAIConfig(\"model_name\")\nmodel = models.openai(client, config)\n
Warning
You need to pass the async client to be able to do batch inference.
"},{"location":"reference/models/openai/#structured-generation-support","title":"Structured Generation Support","text":"Outlines provides support for OpenAI Structured Outputs via outlines.generate.json
, outlines.generate.choice
from pydantic import BaseModel, ConfigDict\nimport outlines.models as models\nfrom outlines import generate\n\nmodel = models.openai(\"gpt-4o-mini\")\n\nclass Person(BaseModel):\n model_config = ConfigDict(extra='forbid') # required for openai\n first_name: str\n last_name: str\n age: int\n\ngenerate.json(model, Person)\ngenerator(\"current indian prime minister on january 1st 2023\")\n# Person(first_name='Narendra', last_name='Modi', age=72)\n\ngenerator = generate.choice(model, [\"Chicken\", \"Egg\"])\nprint(generator(\"Which came first?\"))\n# Chicken\n
Warning
Structured generation support only provided to OpenAI-compatible endpoints which conform to OpenAI's standard. Additionally, generate.regex
and generate.cfg
are not supported.
"},{"location":"reference/models/openai/#advanced-configuration","title":"Advanced configuration","text":"For more advanced configuration option, such as support proxy, please consult the OpenAI SDK's documentation:
from openai import AsyncOpenAI, DefaultHttpxClient\nfrom outlines import models\nfrom outlines.models.openai import OpenAIConfig\n\n\nclient = AsyncOpenAI(\n base_url=\"http://my.test.server.example.com:8083\",\n http_client=DefaultHttpxClient(\n proxies=\"http://my.test.proxy.example.com\",\n transport=httpx.HTTPTransport(local_address=\"0.0.0.0\"),\n ),\n)\nconfig = OpenAIConfig(\"model_name\")\nmodel = models.openai(client, config)\n
It is possible to specify the values for seed
, presence_penalty
, frequence_penalty
, top_p
by passing an instance of OpenAIConfig
when initializing the model:
from outlines.models.openai import OpenAIConfig\nfrom outlines import models\n\n\nconfig = OpenAIConfig(\n presence_penalty=1.,\n frequency_penalty=1.,\n top_p=.95,\n seed=0,\n)\nmodel = models.openai(\"gpt-4o-mini\", config)\n
"},{"location":"reference/models/openai/#monitoring-api-use","title":"Monitoring API use","text":"It is important to be able to track your API usage when working with OpenAI's API. The number of prompt tokens and completion tokens is directly accessible via the model instance:
from openai import AsyncOpenAI\nimport outlines.models\n\n\nmodel = models.openai(\"gpt-4o\")\n\nprint(model.prompt_tokens)\n# 0\n\nprint(model.completion_tokens)\n# 0\n
These numbers are updated every time you call the model.
"},{"location":"reference/models/tgi/","title":"Text-generation-inference (TGI)","text":"TGI uses Outlines to provide structured generation, see their documentation.
"},{"location":"reference/models/transformers/","title":"transformers","text":"Installation
You need to install the transformer
, datasets
and torch
libraries to be able to use these models in Outlines, or alternatively:
pip install \"outlines[transformers]\"\n
Outlines provides an integration with the torch
implementation of causal models in the transformers library. You can initialize the model by passing its name:
from outlines import models\n\nmodel = models.transformers(\"microsoft/Phi-3-mini-4k-instruct\", device=\"cuda\")\n
If you need more fine-grained control you can also initialize the model and tokenizer separately:
from transformers import AutoModelForCausalLM, AutoTokenizer\nfrom outlines import models\n\nllm = AutoModelForCausalLM.from_pretrained(\"gpt2\", output_attentions=True)\ntokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\nmodel = models.Transformers(llm, tokenizer)\n
"},{"location":"reference/models/transformers/#using-logits-processors","title":"Using Logits Processors","text":"There are two ways to use Outlines Structured Generation with HuggingFace Transformers:
- Use Outlines generation wrapper,
outlines.models.transformers
- Use
OutlinesLogitsProcessor
with transformers.AutoModelForCausalLM
Outlines supports a myriad of logits processors for structured generation. In these example, we will use the RegexLogitsProcessor
which guarantees generated text matches the specified pattern.
"},{"location":"reference/models/transformers/#using-outlinesmodelstransformers","title":"Using outlines.models.transformers
","text":"import outlines\n\ntime_regex_pattern = r\"(0?[1-9]|1[0-2]):[0-5]\\d\\s?(am|pm)?\"\n\nmodel = outlines.models.transformers(\"microsoft/Phi-3-mini-4k-instruct\", device=\"cuda\")\ngenerator = outlines.generate.regex(model, time_regex_pattern)\n\noutput = generator(\"The the best time to visit a dentist is at \")\nprint(output)\n# 2:30 pm\n
"},{"location":"reference/models/transformers/#using-models-initialized-via-the-transformers-library","title":"Using models initialized via the transformers
library","text":"import outlines\nimport transformers\n\n\nmodel_uri = \"microsoft/Phi-3-mini-4k-instruct\"\n\noutlines_tokenizer = outlines.models.TransformerTokenizer(\n transformers.AutoTokenizer.from_pretrained(model_uri)\n)\nphone_number_logits_processor = outlines.processors.RegexLogitsProcessor(\n \"\\\\+?[1-9][0-9]{7,14}\", # phone number pattern\n outlines_tokenizer,\n)\n\ngenerator = transformers.pipeline('text-generation', model=model_uri)\n\noutput = generator(\n \"Jenny gave me her number it's \",\n logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor])\n)\nprint(output)\n# [{'generated_text': \"Jenny gave me her number it's 2125550182\"}]\n# not quite 8675309 what we expected, but it is a valid phone number\n
"},{"location":"reference/models/transformers/#alternative-model-classes","title":"Alternative Model Classes","text":"outlines.models.transformers
defaults to transformers.AutoModelForCausalLM
, which is the appropriate class for most standard large language models, including Llama 3, Mistral, Phi-3, etc.
However other variants with unique behavior can be used as well by passing the appropriate class.
"},{"location":"reference/models/transformers/#mamba","title":"Mamba","text":"Mamba is a transformers alternative which employs memory efficient, linear-time decoding.
To use Mamba with outlines you must first install the necessary requirements:
pip install causal-conv1d>=1.2.0 mamba-ssm torch transformers\n
Then you can either create an Mamba-2 Outlines model via
import outlines\n\nmodel = outlines.models.mamba(\"state-spaces/mamba-2.8b-hf\")\n
or explicitly with
import outlines\nfrom transformers import MambaForCausalLM\n\nmodel = outlines.models.transformers(\n \"state-spaces/mamba-2.8b-hf\",\n model_class=MambaForCausalLM\n)\n
Read transformers
's documentation for more information.
"},{"location":"reference/models/transformers/#encoder-decoder-models","title":"Encoder-Decoder Models","text":"You can use encoder-decoder (seq2seq) models like T5 and BART with Outlines.
Be cautious with model selection though, some models such as t5-base
don't include certain characters ({
) and you may get an error when trying to perform structured generation.
T5 Example:
import outlines\nfrom transformers import AutoModelForSeq2SeqLM\n\nmodel_pile_t5 = outlines.models.transformers(\n model_name=\"EleutherAI/pile-t5-large\",\n model_class=AutoModelForSeq2SeqLM,\n)\n
Bart Example:
model_bart = outlines.models.transformers(\n model_name=\"facebook/bart-large\",\n model_class=AutoModelForSeq2SeqLM,\n)\n
"},{"location":"reference/models/transformers_vision/","title":"Transformers Vision","text":"Outlines allows seamless use of vision models.
outlines.models.transformers_vision
has shares interfaces with, and is based on outlines.models.transformers.
Tasks supported include
- image + text -> text
- video + text -> text
"},{"location":"reference/models/transformers_vision/#example-using-llava-next-vision-models","title":"Example: Using Llava-Next Vision Models","text":"Install dependencies pip install torchvision pillow flash-attn
Create the model
import outlines\nfrom transformers import LlavaNextForConditionalGeneration\n\nmodel = outlines.models.transformers_vision(\n \"llava-hf/llava-v1.6-mistral-7b-hf\",\n model_class=LlavaNextForConditionalGeneration,\n device=\"cuda\",\n)\n
Create convenience function to load a PIL.Image
from URL
from PIL import Image\nfrom io import BytesIO\nfrom urllib.request import urlopen\n\ndef img_from_url(url):\n img_byte_stream = BytesIO(urlopen(url).read())\n return Image.open(img_byte_stream).convert(\"RGB\")\n
"},{"location":"reference/models/transformers_vision/#describing-an-image","title":"Describing an image","text":"description_generator = outlines.generate.text(model)\ndescription_generator(\n \"<image> detailed description:\",\n [img_from_url(\"https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg\")]\n)\n
This is a color photograph featuring a Siamese cat with striking blue eyes. The cat has a creamy coat and a light eye color, which is typical for the Siamese breed. Its features include elongated ears, a long, thin tail, and a striking coat pattern. The cat is sitting in an indoor setting, possibly on a cat tower or a similar raised platform, which is covered with a beige fabric, providing a comfortable and soft surface for the cat to rest or perch. The surface of the wall behind the cat appears to be a light-colored stucco or plaster.
"},{"location":"reference/models/transformers_vision/#multiple-images","title":"Multiple Images","text":"To include multiple images in your prompt you simply add more <image>
tokens to the prompt
image_urls = [\n \"https://cdn1.byjus.com/wp-content/uploads/2020/08/ShapeArtboard-1-copy-3.png\", # triangle\n \"https://cdn1.byjus.com/wp-content/uploads/2020/08/ShapeArtboard-1-copy-11.png\", # hexagon\n]\ndescription_generator = outlines.generate.text(model)\ndescription_generator(\n \"<image><image><image>What shapes are present?\",\n list(map(img_from_url, image_urls)),\n)\n
There are two shapes present. One shape is a hexagon and the other shape is an triangle. '
"},{"location":"reference/models/transformers_vision/#classifying-an-image","title":"Classifying an Image","text":"pattern = \"Mercury|Venus|Earth|Mars|Saturn|Jupiter|Neptune|Uranus|Pluto\"\nplanet_generator = outlines.generate.regex(model, pattern)\n\nplanet_generator(\n \"What planet is this: <image>\",\n [img_from_url(\"https://upload.wikimedia.org/wikipedia/commons/e/e3/Saturn_from_Cassini_Orbiter_%282004-10-06%29.jpg\")]\n)\n
Saturn
"},{"location":"reference/models/transformers_vision/#extracting-structured-image-data","title":"Extracting Structured Image data","text":"from pydantic import BaseModel\nfrom typing import List, Optional\n\nclass ImageData(BaseModel):\n caption: str\n tags_list: List[str]\n object_list: List[str]\n is_photo: bool\n\nimage_data_generator = outlines.generate.json(model, ImageData)\n\nimage_data_generator(\n \"<image> detailed JSON metadata:\",\n [img_from_url(\"https://upload.wikimedia.org/wikipedia/commons/9/98/Aldrin_Apollo_11_original.jpg\")]\n)\n
ImageData(caption='An astronaut on the moon', tags_list=['moon', 'space', 'nasa', 'americanflag'], object_list=['moon', 'moon_surface', 'space_suit', 'americanflag'], is_photo=True)
"},{"location":"reference/models/transformers_vision/#resources","title":"Resources","text":""},{"location":"reference/models/transformers_vision/#chosing-a-model","title":"Chosing a model","text":" - https://mmbench.opencompass.org.cn/leaderboard
- https://huggingface.co/spaces/WildVision/vision-arena
"},{"location":"reference/models/vllm/","title":"vLLM","text":"Installation
You need to install the vllm
library to use the vLLM integration. See the installation section for instructions to install vLLM for CPU or ROCm. To get started you can also run:
pip install \"outlines[vllm]\"\n
"},{"location":"reference/models/vllm/#load-the-model","title":"Load the model","text":"Outlines supports models available via vLLM's offline batched inference interface. You can load a model using:
from outlines import models\n\nmodel = models.vllm(\"microsoft/Phi-3-mini-4k-instruct\")\n
Or alternatively:
import vllm\nfrom outlines import models\n\nllm = vllm.LLM(\"microsoft/Phi-3-mini-4k-instruct\")\nmodel = models.VLLM(llm)\n
Models are loaded from the HuggingFace hub.
Device
The default installation of vLLM only allows to load models on GPU. See the installation instructions to run models on CPU.
You can pass any parameter that you would normally pass to vllm.LLM
, as keyword arguments:
from outlines import models\n\nmodel = models.vllm(\n \"microsoft/Phi-3-mini-4k-instruct\",\n trust_remote_code=True,\n gpu_memory_utilization=0.7\n)\n
Main parameters:
Parameters Type Description Default tokenizer_mode
str
\"auto\" will use the fast tokenizer if available and \"slow\" will always use the slow tokenizer. auto
trust_remote_code
bool
Trust remote code when downloading the model and tokenizer. False
tensor_parallel_size
int
The number of GPUs to use for distributed execution with tensor parallelism. 1
dtype
str
The data type for the model weights and activations. Currently, we support float32
, float16
, and bfloat16
. If auto
, we use the torch_dtype
attribute specified in the model config file. However, if the torch_dtype
in the config is float32
, we will use float16
instead. auto
quantization
Optional[str]
The method used to quantize the model weights. Currently, we support \"awq\", \"gptq\" and \"squeezellm\". If None, we first check the quantization_config
attribute in the model config file. If that is None, we assume the model weights are not quantized and use dtype
to determine the data type of the weights. None
revision
Optional[str]
The specific model version to use. It can be a branch name, a tag name, or a commit id. None
tokenizer_revision
Optional[str]
The specific tokenizer version to use. It can be a branch name, a tag name, or a commit id. None
gpu_memory_utilization
float
The ratio (between 0 and 1) of GPU memory to reserve for the model weights, activations, and KV cache. Higher values will increase the KV cache size and thus improve the model's throughput. However, if the value is too high, it may cause out-of-memory (OOM) errors. 0.9
swap_space
int
The size (GiB) of CPU memory per GPU to use as swap space. This can be used for temporarily storing the states of the requests when their best_of
sampling parameters are larger than 1. If all requests will have best_of=1
, you can safely set this to 0. Otherwise, too small values may cause out-of-memory (OOM) errors. 4 enforce_eager
bool
Whether to enforce eager execution. If True, we will disable CUDA graph and always execute the model in eager mode. If False, we will use CUDA graph and eager execution in hybrid. False
enable_lora
bool
Whether to enable loading LoRA adapters False
See the vLLM code for a list of all the available parameters.
"},{"location":"reference/models/vllm/#use-quantized-models","title":"Use quantized models","text":"vLLM supports AWQ, GPTQ and SqueezeLLM quantized models:
from outlines import models\n\nmodel = models.vllm(\"TheBloke/Llama-2-7B-Chat-AWQ\", quantization=\"awq\")\nmodel = models.vllm(\"TheBloke/Mistral-7B-Instruct-v0.2-GPTQ\", quantization=\"gptq\")\nmodel = models.vllm(\"https://huggingface.co/squeeze-ai-lab/sq-llama-30b-w4-s5\", quantization=\"squeezellm\")\n
Dependencies
To use AWQ model you need to install the autoawq library pip install autoawq
.
To use GPTQ models you need to install the autoGTPQ and optimum libraries pip install auto-gptq optimum
.
"},{"location":"reference/models/vllm/#multi-gpu-usage","title":"Multi-GPU usage","text":"To run multi-GPU inference with vLLM you need to set the tensor_parallel_size
argument to the number of GPUs available when initializing the model. For instance to run inference on 2 GPUs:
from outlines import models\n\nmodel = models.vllm(\n \"microsoft/Phi-3-mini-4k-instruct\"\n tensor_parallel_size=2\n)\n
"},{"location":"reference/models/vllm/#load-lora-adapters","title":"Load LoRA adapters","text":"You can load LoRA adapters and alternate between them dynamically:
from outlines import models\n\nmodel = models.vllm(\"facebook/opt-350m\", enable_lora=True)\nmodel.load_lora(\"ybelkaa/opt-350m-lora\") # Load LoRA adapter\nmodel.load_lora(None) # Unload LoRA adapter\n
"},{"location":"reference/models/vllm/#generate-text","title":"Generate text","text":"In addition to the parameters described in the text generation section you can pass an instance of SamplingParams
directly to any generator via the sampling_params
keyword argument:
from vllm.sampling_params import SamplingParams\nfrom outlines import models, generate\n\n\nmodel = models.vllm(\"microsoft/Phi-3-mini-4k-instruct\")\ngenerator = generate.text(model)\n\nparams = SamplingParams(n=2, frequency_penalty=1., min_tokens=2)\nanswer = generator(\"A prompt\", sampling_params=params)\n
This also works with generators built with generate.regex
, generate.json
, generate.cfg
, generate.format
and generate.choice
.
Note
The values passed via the SamplingParams
instance supersede the other arguments to the generator or the samplers.
SamplingParams
attributes:
Parameters Type Description Default n
int
Number of output sequences to return for the given prompt. 1
best_of
Optional[int]
Number of output sequences that are generated from the prompt. From these best_of
sequences, the top n
sequences are returned. best_of
must be greater than or equal to n
. This is treated as the beam width when use_beam_search
is True. By default, best_of
is set to n
. None
presence_penalty
float
Float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens. 0.0
frequency_penalty
float
Float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens. 0.0
repetition_penalty
float
Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens. 1.0
temperature
float
Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling. 1.0
top_p
float
Float that controls the cumulative probability of the top tokens to consider. Must be in (0, 1]. Set to 1 to consider all tokens. 1.0
top_k
int
Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens. -1
min_p
float
Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this. 0.0
seed
Optional[int]
Random seed to use for the generation. None
use_beam_search
bool
Whether to use beam search instead of sampling. False
length_penalty
float
Float that penalizes sequences based on their length. Used in beam search. 1.0
early_stopping
Union[bool, str]
Controls the stopping condition for beam search. It accepts the following values: True
, where the generation stops as soon as there are best_of
complete candidates; False
, where an heuristic is applied and the generation stops when is it very unlikely to find better candidates; \"never\"
, where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm). False
stop
Optional[Union[str, List[str]]]
List of strings that stop the generation when they are generated. The returned output will not contain the stop strings. None
stop_token_ids
Optional[List[int]]
List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens. None
include_stop_str_in_output
bool
Whether to include the stop strings in output text. Defaults to False. False
ignore_eos
bool
Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. False
max_tokens
int
Maximum number of tokens to generate per output sequence. 16
min_tokens
int
Minimum number of tokens to generate per output sequence before EOS or stop_token_ids can be generated 0
skip_special_tokens
bool
Whether to skip special tokens in the output. True
spaces_between_special_tokens
bool
Whether to add spaces between special tokens in the output. Defaults to True. True
"},{"location":"reference/models/vllm/#streaming","title":"Streaming","text":"Warning
Streaming is not available for the offline vLLM integration.
"},{"location":"reference/models/vllm/#installation","title":"Installation","text":"By default the vLLM library is installed with pre-commpiled C++ and CUDA binaries and will only run on GPU:
pip install vllm\n
"},{"location":"reference/models/vllm/#cpu","title":"CPU","text":"You need to have the gcc
compiler installed on your system. Then you will need to install vLLM from source. First clone the repository:
git clone https://github.com/vllm-project/vllm.git\ncd vllm\n
Install the Python packages needed for the installation:
pip install --upgrade pip\npip install wheel packaging ninja setuptools>=49.4.0 numpy\npip install -v -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu\n
and finally run:
VLLM_TARGET_DEVICE=cpu python setup.py install\n
See the vLLM documentation for more details, alternative installation methods (Docker) and performance tips.
"},{"location":"reference/models/vllm/#rocm","title":"ROCm","text":"You will need to install vLLM from source. First install Pytorch on ROCm:
pip install torch==2.2.0.dev20231206+rocm5.7 --index-url https://download.pytorch.org/whl/nightly/rocm5.7 # tested version\n
You will then need to install flash attention for ROCm following these instructions. You can then install xformers=0.0.23
and apply the patches needed to adapt Flash Attention for ROCm:
pip install xformers==0.0.23 --no-deps\nbash patch_xformers.rocm.sh\n
And finally build vLLM:
cd vllm\npip install -U -r requirements-rocm.txt\npython setup.py install # This may take 5-10 minutes.\n
See the vLLM documentation for alternative installation methods (Docker).
"},{"location":"reference/serve/lmstudio/","title":"Serve with LM Studio","text":"Would rather not self-host?
If you want to get started quickly with JSON-structured generation you can call instead .json, a .txt API that guarantees valid JSON.
LM Studio is an application that runs local LLMs. It flexibly mixes GPU and CPU compute in hardware-constrained environments.
As of LM Studio 0.3.4, it natively supports Outlines for structured text generation, using an OpenAI-compatible endpoint.
"},{"location":"reference/serve/lmstudio/#setup","title":"Setup","text":" - Install LM Studio by visiting their downloads page.
- Enable the LM Studio server functionality.
- Download a model.
- Install Python dependencies.
pip install pydantic openai\n
"},{"location":"reference/serve/lmstudio/#calling-the-server","title":"Calling the server","text":"By default, LM Studio will serve from http://localhost:1234
. If you are serving on a different port or host, make sure to change the base_url
argument in OpenAI
to the relevant location.
class Testing(BaseModel):\n \"\"\"\n A class representing a testing schema.\n \"\"\"\n name: str\n age: int\n\nopenai_client = openai.OpenAI(\n base_url=\"http://0.0.0.0:1234/v1\",\n api_key=\"dopeness\"\n)\n\n# Make a request to the local LM Studio server\nresponse = openai_client.beta.chat.completions.parse(\n model=\"hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF\",\n messages=[\n {\"role\": \"system\", \"content\": \"You are like so good at whatever you do.\"},\n {\"role\": \"user\", \"content\": \"My name is Cameron and I am 28 years old. What's my name and age?\"}\n ],\n response_format=Testing\n)\n
You should receive a ParsedChatCompletion[Testing]
object back:
ParsedChatCompletion[Testing](\n id='chatcmpl-3hykyf0fxus7jc90k6gwlw',\n choices=[\n ParsedChoice[Testing](\n finish_reason='stop',\n index=0,\n logprobs=None,\n message=ParsedChatCompletionMessage[Testing](\n content='{ \"age\": 28, \"name\": \"Cameron\" }',\n refusal=None,\n role='assistant',\n function_call=None,\n tool_calls=[],\n parsed=Testing(name='Cameron', age=28)\n )\n )\n ],\n created=1728595622,\n model='lmstudio-community/Phi-3.1-mini-128k-instruct-GGUF/Phi-3.1-mini-128k-instruct-Q4_K_M.gguf',\n object='chat.completion',\n service_tier=None,\n system_fingerprint='lmstudio-community/Phi-3.1-mini-128k-instruct-GGUF/Phi-3.1-mini-128k-instruct-\nQ4_K_M.gguf',\n usage=CompletionUsage(\n completion_tokens=17,\n prompt_tokens=47,\n total_tokens=64,\n completion_tokens_details=None,\n prompt_tokens_details=None\n )\n)\n
You can retrieve your Testing
object with
response.choices[0].message.parsed\n
"},{"location":"reference/serve/vllm/","title":"Serve with vLLM","text":"Would rather not self-host?
If you want to get started quickly with JSON-structured generation you can call instead .json, a .txt API that guarantees valid JSON.
Outlines can be deployed as an LLM service using the vLLM inference engine and a FastAPI server. vLLM is not installed by default so will need to install Outlines with:
pip install outlines[serve]\n
You can then start the server with:
python -m outlines.serve.serve --model=\"microsoft/Phi-3-mini-4k-instruct\"\n
This will by default start a server at http://127.0.0.1:8000
(check what the console says, though). Without the --model
argument set, the OPT-125M model is used. The --model
argument allows you to specify any model of your choosing.
To run inference on multiple GPUs you must pass the --tensor-parallel-size
argument when initializing the server. For instance, to run inference on 2 GPUs:
python -m outlines.serve.serve --model=\"microsoft/Phi-3-mini-4k-instruct\" --tensor-parallel-size 2\n
"},{"location":"reference/serve/vllm/#alternative-method-via-docker","title":"Alternative Method: Via Docker","text":"You can install and run the server with Outlines' official Docker image using the command
docker run -p 8000:8000 outlinesdev/outlines --model=\"microsoft/Phi-3-mini-4k-instruct\"\n
"},{"location":"reference/serve/vllm/#querying-endpoint","title":"Querying Endpoint","text":"You can then query the model in shell by passing a prompt and either
- a JSON Schema specification or
- a Regex pattern
with the schema
or regex
parameters, respectively, to the /generate
endpoint. If both are specified, the schema will be used. If neither is specified, the generated text will be unconstrained.
For example, to generate a string that matches the schema {\"type\": \"string\"}
(any string):
curl http://127.0.0.1:8000/generate \\\n -d '{\n \"prompt\": \"What is the capital of France?\",\n \"schema\": {\"type\": \"string\", \"maxLength\": 5}\n }'\n
To generate a string that matches the regex (-)?(0|[1-9][0-9]*)(\\.[0-9]+)?([eE][+-][0-9]+)?
(a number):
curl http://127.0.0.1:8000/generate \\\n -d '{\n \"prompt\": \"What is Pi? Give me the first 15 digits: \",\n \"regex\": \"(-)?(0|[1-9][0-9]*)(\\\\.[0-9]+)?([eE][+-][0-9]+)?\"\n }'\n
Instead of curl
, you can also use the requests library from another python program.
Please consult the vLLM documentation for details on additional request parameters. You can also read the code in case you need to customize the solution to your needs.
"},{"location":"blog/archive/2024/","title":"2024","text":""},{"location":"blog/category/roadmap/","title":"Roadmap","text":""}]}
\ No newline at end of file
diff --git a/main/sitemap.xml.gz b/main/sitemap.xml.gz
index 1c4505c641110b0ac8bfed380e5f9fc465bbd6e7..bf8bd8fb9cfb0dc00ea9229dc7e62d45f4959030 100644
GIT binary patch
delta 13
Ucmb=gXP58h;Aq&KGLgLk03DhHX8-^I
delta 13
Ucmb=gXP58h;AqH=pU7ST030X;8~^|S