Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
config.py		config.py
models.py		models.py
utils.py		utils.py

README.md

PostgreSQL Database

Currently the database is a PostgreSQL DB. We have separate tables for datasets, models, evalresults, and evalsettings. The specific schemas can be found in models.py and are detailed below.

🗄️ Database Schema

We support automatically logging evaluation results to a unified PostgreSQL database. To enable logging to such a database, please use the "--use_database" flag (which defaults to False)

python -m eval.eval \
    --model hf \
    --tasks MTBench,alpaca_eval \
    --model_args 'pretrained=meta-llama/Meta-Llama-3-8B-Instruct' \
    --batch_size 2 \
    --output_path logs \
    --use_database

To add more details to the database entry, you can also supply these optional flags:

    --model_name "My Model Name" \
    --creation_location "Lab Name" \
    --created_by "Researcher Name"

This requires the user set up a PostgreSQL database with the following comprehensive tables:

Models Table

- id: UUID primary key
- name: Model name
- base_model_id: Reference to parent model
- created_by: Creator of the model
- creation_location: Where model was created
- creation_time: When model was created
- training_start: Start time of training
- training_end: End time of training
- training_parameters: JSON of training configuration
- training_status: Current status of training
- dataset_id: Reference to training dataset
- is_external: Whether model is external
- weights_location: Where model weights are stored
- wandb_link: Link to Weights & Biases dashboard
- git_commit_hash: Model version in HuggingFace
- last_modified: Last modification timestamp

EvalResults Table

- id: UUID primary key
- model_id: Reference to evaluated model
- eval_setting_id: Reference to evaluation configuration
- score: Evaluation metric result
- dataset_id: Reference to evaluation dataset
- created_by: Who ran the evaluation
- creation_time: When evaluation was run
- creation_location: Where evaluation was run
- completions_location: Where outputs are stored

EvalSettings Table

- id: UUID primary key
- name: Setting name
- parameters: JSON of evaluation parameters
- eval_version_hash: Version hash of evaluation code
- display_order: Order in leaderboard display

Datasets Table

- id: UUID primary key
- name: Dataset name
- created_by: Creator of dataset
- creation_time: When dataset was created
- creation_location: Where dataset was created
- data_location: Storage location (S3/GCS/HuggingFace)
- generation_parameters: YAML pipeline configuration
- dataset_type: Type of dataset (SFT/RLHF)
- external_link: Original dataset source URL
- data_generation_hash: Fingerprint of dataset
- hf_fingerprint: HuggingFace fingerprint

Database Configuration

PostgreSQL Setup

Install PostgreSQL on your system
Create a new database for Evalchemy
Create a user with appropriate permissions
Initialize the database schema using our models

Configure Database Connection

Set the following environment variables to enable database logging:

To enable using your own database, we recomend setting up a postgres-sql database with the following parameters.

export DB_PASSWORD=<DB_PASSWORD>
export DB_HOST=<DB_HOST>
export DB_PORT=<DB_PORT>
export DB_NAME=<DB_NAME>
export DB_USER=<DB_USER>

🔄 Updating Database Results

By default, running the evals will create a new entry in the database. If you instead wish to update an existing entry with new results, you can do so by supplying either:

Model ID: --model_id <YOUR_MODEL_ID>
Model Name: --model_name <MODEL_NAME_IN_DB>

Note: If both are provided, model_id takes precedence.

If the model ID and metric are found in the database, default behavior is to not run the benchmark again. If you wish to overwrite the database, you simple pass in --overwrite-database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database

database

README.md

PostgreSQL Database

🗄️ Database Schema

Models Table

EvalResults Table

EvalSettings Table

Datasets Table

Database Configuration

PostgreSQL Setup

Configure Database Connection

🔄 Updating Database Results

Files

database

Directory actions

More options

Directory actions

More options

Latest commit

History

database

Folders and files

parent directory

README.md

PostgreSQL Database

🗄️ Database Schema

Models Table

EvalResults Table

EvalSettings Table

Datasets Table

Database Configuration

PostgreSQL Setup

Configure Database Connection

🔄 Updating Database Results