Skip to content

Commit

Permalink
Merge branch 'main' into fix-invalid-token-id-indexerror
Browse files Browse the repository at this point in the history
  • Loading branch information
RohitRathore1 authored Dec 12, 2024
2 parents 55745d2 + 05bcb4b commit 87c74c8
Show file tree
Hide file tree
Showing 41 changed files with 1,356 additions and 1,586 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
echo "::set-output name=id::$MATRIX_ID"
- name: Run tests
run: |
pytest --cov=outlines
pytest -x --cov=outlines
env:
COVERAGE_FILE: .coverage.${{ steps.matrix-id.outputs.id }}
- name: Upload coverage data
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ docs/build
*.gguf
.venv
benchmarks/results
.python-version

# Remove doc build folders
.cache/
Expand Down
63 changes: 58 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,18 @@

<img src="./docs/assets/images/logo.png" alt="Outlines Logo" width=500></img>

[![.txt Twitter][dottxt-twitter-badge]][dottxt-twitter]

🗒️ *Make LLMs speak the language of every application.* 🗒️

Made with ❤👷️ by the team at [.txt](https://dottxt.co).

[![Documentation][documentation-badge]][documentation]
[![Contributors][contributors-badge]][contributors]
[![Downloads][downloads-badge]][pypistats]
[![Discord][discord-badge]][discord]

[Youtube channel][youtube-dottxt] | [.txt blog][blog-dottxt] | [Twitter][dottxt-twitter]

*Robust (structured) text generation.*

Made with ❤👷️ by the team at [.txt](https://dottxt.co).

</div>

Expand Down Expand Up @@ -83,6 +84,29 @@ generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)
```

You can also pass these choices through en enum:

````python
from enum import Enum

import outlines

class Sentiment(str, Enum):
positive = "Positive"
negative = "Negative"

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?
Review: This restaurant is just awesome!
"""

generator = outlines.generate.choice(model, Sentiment)
answer = generator(prompt)
````

### Type constraint

You can instruct the model to only return integers or floats:
Expand Down Expand Up @@ -190,7 +214,7 @@ character = generator("Give me a character description", seed=seed)
print(repr(character))
# Character(name='Anderson', age=28, armor=<Armor.chainmail: 'chainmail'>, weapon=<Weapon.sword: 'sword'>, strength=8)

character = generator("Give me an interesting character description", rng=rng)
character = generator("Give me an interesting character description")

print(repr(character))
# Character(name='Vivian Thr', age=44, armor=<Armor.plate: 'plate'>, weapon=<Weapon.crossbow: 'crossbow'>, strength=125)
Expand Down Expand Up @@ -299,6 +323,33 @@ print(add(**result))

A great advantage of passing functions directly to specify the structure is that the structure of the LLM will change with the function's definition. No need to change the code at several places!

You can also embed various functions into an enum to generate params:

```python
from enum import Enum
from functools import partial

import outlines


def add(a: int, b: int) -> int:
return a + b

def mul(c: float, d: float) -> float:
return c * d

class Operation(Enum):
add = partial(add)
mul = partial(mul)

model = outlines.models.transformers("WizardLM/WizardMath-7B-V1.1")
generator = outlines.generate.json(model, add)
result = generator("Return json with two float named c and d respectively. c is negative and d greater than 1.0.")

print(result)
# {'c': -3.14, 'd': 1.5}
```

## Prompting

Building prompts can get messy. **Outlines** makes it easier to write and manage
Expand Down Expand Up @@ -363,3 +414,5 @@ answer = outlines.generate.text(model)(prompt, max_tokens=100)
[downloads-badge]: https://img.shields.io/pypi/dm/outlines?color=89AC6B&logo=python&logoColor=white&style=flat-square
[pypistats]: https://pypistats.org/packages/outlines
[dottxt-twitter-badge]: https://img.shields.io/twitter/follow/dottxtai?style=social
[youtube-dottxt]: https://www.youtube.com/@dottxt-ai
[blog-dottxt]: https://blog.dottxt.co/
7 changes: 2 additions & 5 deletions benchmarks/bench_json_schema.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from outlines_core.fsm.json_schema import build_regex_from_schema

from outlines.caching import cache_disabled
from outlines.fsm.guide import RegexGuide
from outlines.fsm.json_schema import build_regex_from_schema

from .common import setup_tokenizer # noqa: E402

Expand Down Expand Up @@ -70,10 +71,6 @@ def setup(self, schema_name):
self.tokenizer = setup_tokenizer()
self.schema = schemas[schema_name]

@cache_disabled()
def time_json_schema_to_regex(self, schema_name):
build_regex_from_schema(self.schema)

@cache_disabled()
def time_json_schema_to_fsm(self, schema_name):
regex = build_regex_from_schema(self.schema)
Expand Down
23 changes: 23 additions & 0 deletions benchmarks/bench_processors.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@
except ImportError:
pass

try:
import jax
import jax.numpy as jnp
except ImportError:
pass


def is_mlx_lm_allowed():
try:
Expand All @@ -18,6 +24,14 @@ def is_mlx_lm_allowed():
return mx.metal.is_available()


def is_jax_allowed():
try:
import jax # noqa: F401
except ImportError:
return False
return True


def get_mock_processor_inputs(array_library, num_tokens=30000):
"""
logits: (4, 30,000 ) dtype=float
Expand All @@ -43,6 +57,13 @@ def get_mock_processor_inputs(array_library, num_tokens=30000):
input_ids = mx.random.randint(
low=0, high=num_tokens, shape=(4, 2048), dtype=mx.int32
)
elif array_library == "jax":
logits = jnp.random.uniform(
key=jax.random.PRNGKey(0), shape=(4, num_tokens), dtype=jnp.float32
)
input_ids = jnp.random.randint(
key=jax.random.PRNGKey(0), low=0, high=num_tokens, shape=(4, 2048)
)
else:
raise ValueError

Expand All @@ -67,6 +88,8 @@ class LogitsProcessorPassthroughBenchmark:
params += ["mlx"]
if torch.cuda.is_available():
params += ["torch_cuda"]
if is_jax_allowed():
params += ["jax"]

def setup(self, array_library):
self.logits_processor = HalvingLogitsProcessor()
Expand Down
34 changes: 34 additions & 0 deletions docs/cookbook/extract_event_details.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
This recipe demonstrates how to use the `outlines` library to extract structured event details from a text message.
We will extract the title, location, and start date and time from messages like the following:

```plaintext
Hello Kitty, my grandmother will be here, I think it's better to postpone
our appointment to review math lessons to next Monday at 2pm at the same
place, 3 avenue des tanneurs, one hour will be enough see you 😘
```

Let see how to extract the event details from the message with the MLX
library dedicated to Apple Silicon processor (M series).

```python
--8<-- "docs/cookbook/extract_event_details.py"
```

The output will be:

```plaintext
Today: Saturday 16 November 2024 and it's 10:55
```

and the extracted event information will be:

```json
{
"title":"Math Review",
"location":"3 avenue des tanneurs",
"start":"2024-11-22T14:00:00Z"
}
```


To find out more about this use case, we recommend the project developped by [Joseph Rudoler](https://x.com/JRudoler) the [ICS Generator](https://github.com/jrudoler/ics-generator)
46 changes: 46 additions & 0 deletions docs/cookbook/extract_event_details.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from datetime import datetime

from pydantic import BaseModel, Field

from outlines import generate, models

# Load the model
model = models.mlxlm("mlx-community/Hermes-3-Llama-3.1-8B-8bit")


# Define the event schema using Pydantic
class Event(BaseModel):
title: str = Field(description="title of the event")
location: str
start: datetime = Field(
default=None, description="date of the event if available in iso format"
)


# Get the current date and time
now = datetime.now().strftime("%A %d %B %Y and it's %H:%M")

# Define the prompt
prompt = f"""
Today's date and time are {now}
Given a user message, extract information of the event like date and time in iso format, location and title.
If the given date is relative, think step by step to find the right date.
Here is the message:
"""

# Sample message
message = """Hello Kitty, my grandmother will be here , I think it's better to postpone our
appointment to review math lessons to next Friday at 2pm at the same place, 3 avenue des tanneurs, I think that one hour will be enough
see you 😘 """

# Create the generator
generator = generate.json(model, Event)

# Extract the event information
event = generator(prompt + message)

# Print the current date and time
print(f"Today: {now}")

# Print the extracted event information in JSON format
print(event.json())
Binary file added docs/cookbook/images/trader-joes-receipt.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/cookbook/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,7 @@ This part of the documentation provides a few cookbooks that you can browse to g
- [Knowledge Graph Generation](knowledge_graph_extraction.md): Generate a Knowledge Graph from unstructured text using JSON-structured generation.
- [Chain Of Thought (CoT)](chain_of_thought.md): Generate a series of intermediate reasoning steps using regex-structured generation.
- [ReAct Agent](react_agent.md): Build an agent with open weights models using regex-structured generation.
- [Earnings reports to CSV](earnings-reports.md): Extract data from earnings reports to CSV using regex-structured generation.
- [Vision-Language Models](atomic_caption.md): Use Outlines with vision-language models for tasks like image captioning and visual reasoning.
- [Receipt Digitization](receipt-digitization.md): Extract information from a picture of a receipt using structured generation.
- [Structured Generation from PDFs](read-pdfs.md): Use Outlines with vision-language models to read PDFs and produce structured output.
Loading

0 comments on commit 87c74c8

Please sign in to comment.