Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"target" doesn't have contribute "pre_skeleton" , how to fix it ? #35

Open
riverhell opened this issue May 22, 2024 · 2 comments
Open

Comments

@riverhell
Copy link

The error comes from here:
if jaccard_similarity(train_json[index]["pre_skeleton"], target["pre_skeleton"]) < self.threshold:

the details as below:
File "/home/jiangshan/code/DAIL-SQL/prompt/ExampleSelectorTemplate.py", line 353, in get_examples
if jaccard_similarity(train_json[index]["pre_skeleton"], target["pre_skeleton"]) < self.threshold:
File "/home/jiangshan/code/DAIL-SQL/prompt/PromptICLTemplate.py", line 51, in format
examples = self.get_examples(target, self.NUM_EXAMPLE * scope_factor, cross_domain=cross_domain)
File "/home/jiangshan/code/DAIL-SQL/generate_question.py", line 87, in
question_format = prompt.format(target=question_json,
KeyError: 'pre_skeleton'

index = 2039

train_json[index]:
{'db_id': 'party_people', 'query': 'SELECT count() FROM region', 'query_toks': ['SELECT', 'count', '(', '', ')', 'FROM', 'region'], 'query_toks_no_value': ['select', 'count', '(', '*', ')', 'from', 'region'], 'question': 'How many regions do we have?', 'question_toks': ['How', 'many', 'regions', 'do', 'we', 'have', '?'], 'sql': {'from': {...}, 'select': [...], 'where': [...], 'groupBy': [...], 'having': [...], 'orderBy': [...], 'limit': None, 'intersect': None, 'union': None, 'except': None}, 'tables': [{...}, {...}, {...}, {...}], 'query_skeleton': 'select count ( _ ) from _', 'path_db': '/home/jiangshan/data/datasets/llm/spider/database/party_people/party_people.sqlite', 'sc_link': {'q_col_match': {...}, 'q_tab_match': {...}}, 'cv_link': {'num_date_match': {}, 'cell_match': {}}, 'question_for_copying': ['how', 'many', 'regions', 'do', 'we', 'have', '?'], 'column_to_table': {'0': None, '1': 0, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 1, '8': 1, '9': 1, '10': 1, '11': 1, '12': 1, '13': 2, '14': 2, '15': 2, '16': 2, '17': 3, '18': 3, ...}, 'table_names_original': ['region', 'party', 'member', 'party_events'], 'question_pattern': 'how many _ do we have ?', 'pre_skeleton': 'select count ( _ ) from _'}

The target is :
{'db_id': 'concert_singer', 'query': 'SELECT count() FROM singer', 'query_toks': ['SELECT', 'count', '(', '', ')', 'FROM', 'singer'], 'query_toks_no_value': ['select', 'count', '(', '*', ')', 'from', 'singer'], 'question': 'How many singers do we have?', 'question_toks': ['How', 'many', 'singers', 'do', 'we', 'have', '?'], 'sql': {'from': {...}, 'select': [...], 'where': [...], 'groupBy': [...], 'having': [...], 'orderBy': [...], 'limit': None, 'intersect': None, 'union': None, 'except': None}, 'tables': [{...}, {...}, {...}, {...}], 'query_skeleton': 'select count ( _ ) from _', 'path_db': '/home/jiangshan/data/datasets/llm/spider/database/concert_singer/concert_singer.sqlite', 'sc_link': {'q_col_match': {...}, 'q_tab_match': {...}}, 'cv_link': {'num_date_match': {}, 'cell_match': {}}, 'question_for_copying': ['how', 'many', 'singers', 'do', 'we', 'have', '?'], 'column_to_table': {'0': None, '1': 0, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 1, '9': 1, '10': 1, '11': 1, '12': 1, '13': 1, '14': 1, '15': 2, '16': 2, '17': 2, '18': 2, ...}, 'table_names_original': ['stadium', 'singer', 'concert', 'singer_in_concert'], 'question_pattern': 'how many _ do we have ?'}

The debug args are:
"args": [
"--data_type", "spider",
"--split", "test",
"--tokenizer", "/home/schinta/data/model/llm/pre_train/THUDM/chatglm3-6b",
"--max_seq_len", "4096",
"--selector_type", "EUCDISMASKPRESKLSIMTHR",
"--prompt_repr", "SQL",
"--k_shot", "9",
"--example_type", "QA"
]

The pip list show as below:
Package Version


accelerate 0.28.0
aiofiles 23.2.1
aiohttp 3.8.4
aiosignal 1.3.1
altair 5.2.0
annotated-types 0.6.0
annoy 1.17.1
anyio 3.7.0
async-timeout 4.0.2
attrs 23.1.0
bpemb 0.3.5
certifi 2024.2.2
charset-normalizer 3.1.0
click 8.1.7
cmake 3.26.3
contourpy 1.1.1
corenlp-protobuf 3.8.0
cpm-kernels 1.0.11
cycler 0.12.1
dataclasses-json 0.5.7
distro 1.9.0
exceptiongroup 1.1.1
ffmpy 0.3.2
filelock 3.12.0
fonttools 4.50.0
frozenlist 1.3.3
fsspec 2023.5.0
gensim 4.3.2
greenlet 2.0.2
h11 0.14.0
httpcore 1.0.4
httpx 0.27.0
huggingface-hub 0.23.0
idna 3.7
importlib_resources 6.4.0
Jinja2 3.1.2
joblib 1.2.0
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
latex2mathml 3.77.0
lit 16.0.6
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.2
marshmallow 3.19.0
marshmallow-enum 1.5.1
matplotlib 3.7.5
mdtex2html 1.3.0
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.4
mypy-extensions 1.0.0
nemoguardrails 0.3.0
networkx 3.1
nltk 3.8.1
numexpr 2.8.4
numpy 1.24.4
nvidia-cublas-cu11 11.10.3.66
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu11 8.5.0.96
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu11 10.9.0.58
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu11 10.2.10.91
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu11 11.7.4.91
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu11 2.14.3
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu11 11.7.91
nvidia-nvtx-cu12 12.1.105
openai 1.30.1
openapi-schema-pydantic 1.2.4
orjson 3.9.15
packaging 24.0
pandas 2.0.3
pillow 10.2.0
pip 24.0
pkgutil_resolve_name 1.3.10
protobuf 3.20.3
psutil 5.9.8
pydantic 2.7.1
pydantic_core 2.18.2
pydub 0.25.1
Pygments 2.17.2
pyparsing 3.1.2
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0
referencing 0.34.0
regex 2023.5.5
requests 2.31.0
rfc3986 1.5.0
rich 13.7.1
rpds-py 0.18.0
ruff 0.3.4
safetensors 0.3.1
scikit-learn 1.2.2
scipy 1.10.1
semantic-version 2.10.0
sentence-transformers 2.2.2
sentencepiece 0.1.99
setuptools 65.5.1
shellingham 1.5.4
simpleeval 0.9.13
six 1.16.0
smart-open 7.0.4
sniffio 1.3.0
sql_metadata 2.11.0
SQLAlchemy 2.0.17
sqlparse 0.5.0
stanford-corenlp 3.9.2
sympy 1.12
threadpoolctl 3.1.0
tokenizers 0.13.3
tomlkit 0.12.0
toolz 0.12.1
torch 2.3.0
torchtext 0.18.0
torchvision 0.18.0
tqdm 4.65.0
transformers 4.27.1
triton 2.3.0
typer 0.10.0
typing_extensions 4.11.0
typing-inspect 0.9.0
tzdata 2024.1
urllib3 2.2.1
uvicorn 0.22.0
websockets 11.0.3
wheel 0.43.0
wrapt 1.16.0
yarl 1.9.2
zipp 3.18.1

@BeachWang
Copy link
Owner

Hi, I apologize for any inconvenience caused. For clarity, kindly refer to the run_daily_sql_mini.sh script. If you use EUCDISMASKPRESKLSIMTHR as your --selector_type, remember to also provide the path to your SQL file that was predicted beforehand. Put this path in --pre_test_result. This tells the script exactly where to find your initial SQL predictions.

@mattjdraper
Copy link

Hi there,

In the DAIL-SQL paper it is stated that Graphix is used as the model to produce pre-predictions. In the run_dail_sql_mini.sh script the pre test result is set as ./results/DAIL-SQL+GPT-4.txt

Should this not be set to graphix_result.txt? And is the code available that generates these graphix predictions? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants