Skip to content

Laqcce-cao/RSL-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Please give us a star ⭐ for the latest update.

arXiv

Overview

Main Results

Execution Accuracy on BIRD Dev Set

Ablaition Study

Project directory structure

RSL-SQL/
├── README.md
├── requirements.txt
│
├── data/
│   ├── column_meaning.json
│   ├── dev.json
│   └── dev_tables.json
│
├── database/
│   └── dev_databases/
│
├── few_shot/
│   ├── sentence_transformers/
│   └── train-00000-of-00001-fe8894d41b7815be.parquet
│
└── src/
    └── configs/
        └── config.py

environment

conda create -n rsl_sql python=3.10
conda activate rsl_sql
pip install -r requirements.txt

Modify parameter configuration in src/configs/config.py

dev_databases_path = 'database/dev_databases'
dev_json_path = 'data/dev.json'
api = '..'
base_url = 'http://'

RUN

1. Data Preprocessing

# Construct `ppl_dev.json`. 
python src/data_construct.py 

#Construct few-shot examples pairs
python few_shot/construct_QA.py 

# Generate few-shot examples
python few_shot/slg_main.py --dataset src/information/ppl_dev.json --out_file src/information/example.json --kshot 3

# add few-shot examples to ppl_dev.json
python src/information/add_example.py

2. preliminary sql generation and bidirectional schema linking

# step 1: preliminary sql
# There are two output files in this step, one is `src/sql_log/preliminary_sql.txt` and the other is `src/schema_linking/LLM.json`
# If an error occurs, you need to save these two files in time, then continue running and save the subsequent results.
python src/step_1_preliminary_sql.py --ppl_file src/information/ppl_dev.json --sql_out_file src/sql_log/preliminary_sql.txt --Schema_linking_LLM src/schema_linking/LLM.json --start_index 0

# schema linking
python src/bid_schema_linking.py --pre_sql_file src/sql_log/preliminary_sql.txt --sql_sl_output src/schema_linking/sql.json --hint_sl_output src/schema_linking/hint.json --LLM_sl_output src/schema_linking/LLM.json --Schema_linking_output src/schema_linking/schema.json
cp src/schema_linking/schema.json src/information

# add schema linking to ppl_dev.json
python src/information/add_sl.py

3. SQL Generation based simplified schema and Information augmentation

# step 2: sql generation
# There are two output files in this step, one is `src/sql_log/step_2_information_augmentation.txt` and the other is `src/information/augmentation.json`
# If an error occurs, you need to save these two files in time, then continue running and save the subsequent results.
python src/step_2_information_augmentation.py --ppl_file src/information/ppl_dev.json --sql_2_output src/sql_log/step_2_information_augmentation.txt --information_output src/information/augmentation.json --start_index 0

# add augmentation to ppl_dev.json
python src/information/add_augmentation.py

4. SQL selection

# step 3: sql selection
# There is one output files in this step, one is `src/sql_log/step_3_binary.txt`.
# If an error occurs, you need to save these two files in time, then continue running and save the subsequent results.
python src/step_3_binary_selection.py --ppl_file src/information/ppl_dev.json --sql_3_output src/sql_log/step_3_binary.txt --sql_1 src/sql_log/preliminary_sql.txt --sql_2 src/sql_log/step_2_information_augmentation.txt --start_index 0

5. SQL refinement

# step 4: sql refinement
# There is one output files in this step, one is `src/sql_log/final_sql.txt`.
python src/step_4_self_correction.py --ppl_file src/information/ppl_dev.json --sql_4_output src/sql_log/final_sql.txt --sql_refinement src/sql_log/step_3_binary.txt --start_index 0

Evaluation

Execution (EX) Evaluation:

Refer to the official evaluation script, the link is: https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/bird

Strict Recall Rate Evaluation:

The script is in the evaluation/evaluation_SL.py file, and the usage is as follows: We should organize the output of the database elements in the following format:

{
        "tables": [
            "frpm"
        ],
        "columns": [
            "frpm.`Free Meal Count (K-12)`",
            "frpm.`Enrollment (K-12)`",
            "frpm.`School Name`",
            "frpm.`County Name`"
        ]
    }

Citation

@article{cao2024rsl,
  title={RSL-SQL: Robust Schema Linking in Text-to-SQL Generation},
  author={Cao, Zhenbiao and Zheng, Yuanlei and Fan, Zhihao and Zhang, Xiaojin and Chen, Wei},
  journal={arXiv preprint arXiv:2411.00073},
  year={2024}
}

About

RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages