fastdata

fastdata is a minimal library for generating synthetic data for training deep learning models. For example, below is how you can generate a dataset to train a language model to translate from English to Spanish.

First you need to define the structure of the data you want to generate. claudette, which is the library that fastdata uses to generate data, requires you to define the schema of the data you want to generate.

from fastcore.utils import *

class Translation():
    "Translation from an English phrase to a Spanish phrase"
    def __init__(self, english: str, spanish: str): store_attr()
    def __repr__(self): return f"{self.english} ➡ *{self.spanish}*"

Translation("Hello, how are you today?", "Hola, ¿cómo estás hoy?")

Hello, how are you today? ➡ *Hola, ¿cómo estás hoy?*

Next, you need to define the prompt that will be used to generate the data and any inputs you want to pass to the prompt.

prompt_template = """\
Generate English and Spanish translations on the following topic:
<topic>{topic}</topic>
"""

inputs = [{"topic": "Otters are cute"}, {"topic": "I love programming"}]

Finally, we can generate some data with fastdata.

Note

We only support Anthropic models at the moment. Therefore, make sure you have an API key for the model you want to use and the proper environment variables set or pass the api key to the FastData class FastData(api_key="sk-ant-api03-...").

from fastdata.core import FastData

fast_data = FastData(model="claude-3-haiku-20240307")
translations = fast_data.generate(
    prompt_template=prompt_template,
    inputs=inputs,
    schema=Translation,
)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.57it/s]

from IPython.display import Markdown

Markdown("\n".join(f'- {t}' for t in translations))

I love programming ➡ Me encanta la programación
Otters are cute ➡ Las nutrias son lindas

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/AnswerDotAI/fastdata.git

or from pypi

$ pip install python-fastdata

If you’d like to see how best to generate data with fastdata, check out our blog post here and some of the examples in the examples directory.

Developer Guide

If you are new to using nbdev here are some useful pointers to get you started.

Install fastdata in Development mode

# make sure fastdata package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to fastdata
$ nbdev_prepare

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
examples		examples
fastdata		fastdata
nbs		nbs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastdata

Installation

Developer Guide

Install fastdata in Development mode

About

Releases 2

Packages

Contributors 3

Languages

License

AnswerDotAI/fastdata

Folders and files

Latest commit

History

Repository files navigation

fastdata

Installation

Developer Guide

Install fastdata in Development mode

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages