Storyteller is an advanced, modular system designed for orchestrating Large Language Models (LLMs) in batch processes. While it originated as a tool for generating synthetic narrative data, it has evolved into a flexible and powerful framework for managing complex LLM-driven workflows across various domains and use cases.
Note: Storyteller is currently in alpha and may be difficult to get running. Parts of it are non-functional or not fully tested.
Warning: This system can consume a significant number of tokens if not used carefully, monitor usage carefully if you are using a paid LLM. This is especially true if you're using the test stage in the repo, I can't stress this enough.
- Versatile LLM Orchestration: Coordinate and manage multiple LLM interactions within a single workflow.
- Batch Processing: Efficiently handle large-scale data processing tasks using LLMs.
- Modular Architecture: Easily extendable with plugins for various content types and processing needs.
- Multi-Stage Pipeline: Configurable stages for different aspects of your LLM workflow.
- LLM Integration: Supports multiple LLM backends, including OpenAI's GPT and Google's Vertex AI Gemini.
- Efficient Storage Management: Handles ephemeral, batch, and output storage for different content lifecycles.
- Dynamic Configuration: YAML-based configuration with validation for easy setup and modification.
- Progress Tracking: Built-in progress tracking and resumability for long-running processes.
- Flexible Content Processing: Support for JSON, lists, plain text, and custom formats.
- Error Handling and Repair: LLM-enabled error handling with limited repair capabilities.
Storyteller can be applied to a wide range of LLM-driven tasks, including but not limited to:
- Large-scale text analysis and processing
- Automated content generation for various industries
- Data augmentation and synthetic data generation
- Complex decision-making systems
- Multi-step reasoning and problem-solving workflows
- Python 3.10+
- Dependencies listed in
requirements.txt
-
Clone the repository:
git clone https://github.com/tachyon-beep/storyteller.git cd storyteller
-
Install dependencies:
pip install -r requirements.txt
-
Configure the system by editing
config/pipeline.development.yaml
. -
Define your stages, phases, prompts, schemas and guidance as described below.
-
Run the orchestrator:
set PYTHONPATH=./src:./plugins python storyteller.py
The system is configured using YAML files located in the config/
directory. The main configuration file is pipeline.development.yaml
, which includes settings for:
- Paths
- Batch processing
- LLM settings
- Plugin configurations
- Pipeline stages and phases
- Content processing parameters
- Define the new stage in
pipeline.development.yaml
under thestages
section. - Create corresponding prompt files in the
prompts/
directory. - If required, create a schemas and place it in the
schemas/
directory. - Update the general and stage level guidance in the
guidance/
directory as required.
Storyteller is built with a modular architecture:
storyteller.py
: Main entry pointstoryteller_orchestrator.py
: Manages the overall pipeline executionstoryteller_stage_manager.py
: Handles individual stages and progress trackingstoryteller_plugin_manager.py
: Manages content processing pluginsstoryteller_storage_manager.py
: Coordinates different storage typesstoryteller_llm_factory.py
: Creates and manages LLM instancesstoryteller_content_processor.py
: Processes generated contentstoryteller_prompt_manager.py
: Manages prompt preparation and handling
- Create a new plugin file in the
plugins/
directory. - Implement the plugin class, extending
StorytellerOutputPlugin
. - Add the plugin configuration to
pipeline.development.yaml
. - Develop appropriate guidance on output formats for inclusion in generated prompts.
The roadmap includes:
- Enhancing LLM integration (OpenAI, AzureOpenAI plugins, agent mode)
- Improving batch processing and parallelization
- Architectural improvements for better modularity
- Expanding the content plugin system
- Implementing testing and quality assurance measures
- Enhancing security, monitoring, and logging capabilities
- Containerization for easier deployment
For a detailed list of planned features and improvements, please see our TODO list.
Contributions are welcome (particularly for plugins) but the development target is quiet volatile. Please see our Contributors Policy for more details.
This project is licensed under the MIT License.