Skip to content

Latest commit

 

History

History
132 lines (90 loc) · 4.85 KB

README.md

File metadata and controls

132 lines (90 loc) · 4.85 KB

Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

image1

📃Paper | 💾Dataset

😎: This is the official implementation repository of our study on co-temporal reasoning capabilities in Large Language Models (LLMs), accepted at ACL 2024 Main Conference.


We propose the CotempQA, the first comprehensive co-temporal Question Answering (QA) benchmark containing four co-temporal scenarios (Equal, Overlap, During, Mix) with 4,748 samples for evaluating the co-temporal comprehension and reasoning abilities of LLMs.

📊 Leaderboard

Model Overall Equal Overlap During Mix
Human 92.8 97.7 92.3 84.5 92.1
GPT-4 54.7 92.7 59.4 50.1 45.0
GPT-3.5-Turbo 38.9 62.8 44.3 37.2 23.4
WizardMath-70B 30.1 41.8 28.6 31.3 16.6
LLaMA2-70B 22.2 26.8 21.2 21.4 23.8
CodeLLaMA-34B 20.0 31.3 18.4 18.3 22.4
WizardCoder-34B 19.2 22.9 18.8 19.9 13.4
WizardMath-13B 14.4 26.4 13.0 14.4 6.6
CodeLLaMA-13B 12.4 18.0 10.6 11.3 15.9
LLaMA2-13B 13.8 21.2 13.7 12.8 14.0
WizardCoder-13B 13.9 12.4 12.4 14.6 12.6
WizardMath-7B 14.8 14.4 12.2 16.0 11.2
LLaMA2-7B 12.0 11.5 12.1 12.0 12.0
WizardCoder-7B 11.2 15.1 9.8 11.1 10.5
CodeLLaMA-7B 10.5 17.0 8.8 9.5 13.0

⚙️ Installation

To get started, clone this repository and install the required packages:

git clone https://github.com/zhaochen0110/Cotempqa.git
cd Cotempqa
pip install -r requirements.txt

🚧 Data Loading

Download CotempQA from this link or load it using the code below:

from datasets import load_dataset
dataset = load_dataset("Warrieryes/CotempQA")

💎 Quick Evaluation

Replicate our experimental results by running:

python inference.py --model_name "$model_name" \
--data_mode "$data_mode" \
--mode "$mode" \
--output_path "$output_path" \
--result_path "$result_path" 

🏗️ Datasets Construction

Our framework can be generalized to other structured temporal databases. To facilitate further research, we provide a detailed pipeline to generate our dataset, from extracting data from Wikidata to creating CotempQA data.

Structuring Temporal Facts

We based our work on TempLAMA. First, install SLING and download the Wikidata KB. Use the following commands to structure facts:

bash install.sh <path_to_store_wikipedia_dumps>
bash prepare_data.sh <path_to_store_wikipedia_dumps> <path_to_store_events>

Extracting Co-temporal Facts

We categorize fact pairs into five scenarios based on the consistency or variation of $(\mathcal{S}, \mathcal{R}, \mathcal{O})$. Use the following script to extract co-temporal facts:

python construct_comtempqa.py --rawdata_path <rawdata_path>  \
                  --qid_path <qid_path> \
                  --subject_path <subject_path> \
                  --object_path <object_path> \
                  --output_path <output_path>

QA Pairs Construction

Construct QA pairs using the following command:

python add_cotemporal_expression.py

📬 Contact

For any questions or inquiries, please feel free to open an issue or contact us at [[email protected]].

🤝 Contributing

We welcome contributions to CotempQA! If you have any suggestions or improvements, please open a pull request or contact us directly.

📜 License

This project is licensed under the Apache 2.0 license - see the LICENSE file for details.

Acknowledgements

This project is based on the work done in TempLAMA. Special thanks to their authors for valuable contributions.

Citation

If you find our work useful, please consider citing our paper:

@article{su2024living,
  title={Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?},
  author={Su, Zhaochen and Li, Juntao and Zhang, Jun and Zhu, Tong and Qu, Xiaoye and Zhou, Pan and Bowen, Yan and Cheng, Yu and others},
  journal={arXiv preprint arXiv:2406.09072},
  year={2024}
}