Dev dataset score is 58.15.
Below are the steps to run bird evaluation
You have to login in TiDBCloud, and create a Chat2Query DataApp.
Save the Base URL, we'll use it in step 5.
Save the public key and private key, we'll use it in step 5.
$ git clone https://github.com/tidbcloud/chat2query_bench
$ cd chat2query_bench/benchmark_bird
$ pip install -r requirements.txt
Download the bird dataset: https://bird-bench.oss-cn-beijing.aliyuncs.com/dev.zip
unzip it in the benchmark_bird/data
folder, and make sure the folder name is data
, not dev
,
rename dev.sql
to dev_gold.sql
.
File structures should like:
$ tree -L 1 data/
data/
├── dev_databases
├── dev_gold.sql
└── dev.json
2 directories, 2 files
NOTE By default, we will run the bird evaluation with GPT-3.5 model. If you want to use GPT-4 model, you need to provide your org_id to us, and we will enable
settings
api for you. You can do this by sending an email to[email protected]
.
You can customize the bird evaluation parameters by calling settings
API, for example:
export PUBLIC_KEY="<Your Public Key>"
export PRIVATE_KEY="<Your Private Key>"
export BASE_URL="<Your data app endpoint url>"
curl --digest --user ${PUBLIC_KEY}:${PRIVATE_KEY} --request PUT ${BASE_URL}\
--header 'content-type: application/json' \
--data-raw '{
"openai_api_key": "<Your Secret OpenAI API Key>",
"language": "English",
"ai_model": "gpt-4"
}'
Replace or paste the BASE_URL, PUBLIC_KEY, PRIVATE_KEY variables in runbird.sh
If you want to run the evaluation with GPT-4 model, make sure you've send the OpenAI API Key and Public Key to us and after we've enabled GPT-4 model for you, you can run the script.
$./runbird.sh