Storyteller: General-Purpose LLM Orchestration and Batch Processing System

Storyteller is an advanced, modular system designed for orchestrating Large Language Models (LLMs) in batch processes. While it originated as a tool for generating synthetic narrative data, it has evolved into a flexible and powerful framework for managing complex LLM-driven workflows across various domains and use cases.

Note: Storyteller is currently in alpha and may be difficult to get running. Parts of it are non-functional or not fully tested.

Warning: This system can consume a significant number of tokens if not used carefully, monitor usage carefully if you are using a paid LLM. This is especially true if you're using the test stage in the repo, I can't stress this enough.

Features

Versatile LLM Orchestration: Coordinate and manage multiple LLM interactions within a single workflow.
Batch Processing: Efficiently handle large-scale data processing tasks using LLMs.
Modular Architecture: Easily extendable with plugins for various content types and processing needs.
Multi-Stage Pipeline: Configurable stages for different aspects of your LLM workflow.
LLM Integration: Supports multiple LLM backends, including OpenAI's GPT and Google's Vertex AI Gemini.
Efficient Storage Management: Handles ephemeral, batch, and output storage for different content lifecycles.
Dynamic Configuration: YAML-based configuration with validation for easy setup and modification.
Progress Tracking: Built-in progress tracking and resumability for long-running processes.
Flexible Content Processing: Support for JSON, lists, plain text, and custom formats.
Error Handling and Repair: LLM-enabled error handling with limited repair capabilities.

Use Cases

Storyteller can be applied to a wide range of LLM-driven tasks, including but not limited to:

Large-scale text analysis and processing
Automated content generation for various industries
Data augmentation and synthetic data generation
Complex decision-making systems
Multi-step reasoning and problem-solving workflows

System Requirements

Python 3.10+
Dependencies listed in requirements.txt

Quick Start

Clone the repository:

git clone https://github.com/tachyon-beep/storyteller.git
cd storyteller

Install dependencies:
```
pip install -r requirements.txt
```
Configure the system by editing config/pipeline.development.yaml.
Define your stages, phases, prompts, schemas and guidance as described below.

Run the orchestrator:

set PYTHONPATH=./src:./plugins
python storyteller.py

Configuration

The system is configured using YAML files located in the config/ directory. The main configuration file is pipeline.development.yaml, which includes settings for:

Paths
Batch processing
LLM settings
Plugin configurations
Pipeline stages and phases
Content processing parameters

Creating New Stages

Define the new stage in pipeline.development.yaml under the stages section.
Create corresponding prompt files in the prompts/ directory.
If required, create a schemas and place it in the schemas/ directory.
Update the general and stage level guidance in the guidance/ directory as required.

Architecture

Storyteller is built with a modular architecture:

storyteller.py: Main entry point
storyteller_orchestrator.py: Manages the overall pipeline execution
storyteller_stage_manager.py: Handles individual stages and progress tracking
storyteller_plugin_manager.py: Manages content processing plugins
storyteller_storage_manager.py: Coordinates different storage types
storyteller_llm_factory.py: Creates and manages LLM instances
storyteller_content_processor.py: Processes generated content
storyteller_prompt_manager.py: Manages prompt preparation and handling

Extending Storyteller

Adding New Plugins

Create a new plugin file in the plugins/ directory.
Implement the plugin class, extending StorytellerOutputPlugin.
Add the plugin configuration to pipeline.development.yaml.
Develop appropriate guidance on output formats for inclusion in generated prompts.

Project Roadmap

The roadmap includes:

Enhancing LLM integration (OpenAI, AzureOpenAI plugins, agent mode)
Improving batch processing and parallelization
Architectural improvements for better modularity
Expanding the content plugin system
Implementing testing and quality assurance measures
Enhancing security, monitoring, and logging capabilities
Containerization for easier deployment

For a detailed list of planned features and improvements, please see our TODO list.

Contributing

Contributions are welcome (particularly for plugins) but the development target is quiet volatile. Please see our Contributors Policy for more details.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
config		config
data		data
guidance		guidance
plugins		plugins
prompts		prompts
schemas		schemas
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
TODO.md		TODO.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Storyteller: General-Purpose LLM Orchestration and Batch Processing System

Features

Use Cases

System Requirements

Quick Start

Configuration

Creating New Stages

Architecture

Extending Storyteller

Adding New Plugins

Project Roadmap

Contributing

License

About

Releases

Languages

License

tachyon-beep/storyteller

Folders and files

Latest commit

History

Repository files navigation

Storyteller: General-Purpose LLM Orchestration and Batch Processing System

Features

Use Cases

System Requirements

Quick Start

Configuration

Creating New Stages

Architecture

Extending Storyteller

Adding New Plugins

Project Roadmap

Contributing

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Languages