A comprehensive testing suite for analyzing and evaluating the behavior of Language Learning Models (LLMs) across various instruction-following tasks and constraints.
This project, developed by CloudCode, aims to provide a robust framework for testing the capabilities of LLMs in following complex instructions and adhering to specific linguistic rules. Our suite of tests covers a wide range of scenarios, from simple constraints to intricate language patterns.
- Diverse test cases covering various linguistic and structural constraints
- Easily extensible framework for adding new tests
- Automated evaluation of LLM responses
- Comprehensive documentation of test cases and their purposes
- Python 3.7+
- LLM Provider and their API keys
-
Clone the repository:
git clone https://github.com/Cloud-Code-AI/llm-behavior-lab.git cd llm-behavior-lab
-
Install required packages:
pip install -r requirements.txt
-
Set up your API key:
- Create a
.env
file in the root directory - Add your API key:
OPENAI_API_KEY=your_api_key_here
- Create a
Run the main test suite:
python main.py
To run specific tests:
python main.py --test alliteration palindrome
- Basic Constraints (e.g., avoiding specific letters)
- Structural Patterns (e.g., increasing word length, specific line counts)
- Linguistic Creativity (e.g., alliteration, rhyme schemes)
- Complex Rules (e.g., progressive letter exclusion, Fibonacci word lengths)
This project is licensed under the MIT License - see the LICENSE.md file for details.
- OpenAI for their GPT models and API
- All contributors and testers who help improve this suite
For any queries, please open an issue or contact us at [[email protected]].