KindsOfReasoning is a collection of various datasets from various sources, testing different kinds of reasoning.
This repository contains the original dataset and the results of various models on it, together with code to run LLMs on the dataset through the OpenAI evals
library.
In particular, the root folder of the repository contains the following files:
KindsOfReasoning.<csv|json>
: the collection of datasets in a single dataframeKindsOfReasoning_with_llm_results.<csv|json>
: the collection of datasets, with results of the tested LLMs, in a single dataframe
To use the final datasets, you can load one of those files with pandas
or another library. If instead you want to re-create the dataset from scratch, or run it on another LLM by adapting the code used to obtain the original results, look into the full_processing_steps
folder.
'text-ada-001',
'text-babbage-001',
'text-curie-001',
'text-davinci-001',
'text-davinci-002',
'text-davinci-003',
'gpt-3.5-turbo-0301',
'gpt-3.5-turbo-0613',
'gpt-3.5-turbo-1106',
'gpt-3.5-turbo-0125',
'gpt-4-0314',
'gpt-4-0613',
'gpt-4-1106-preview',
'gpt-4-0125-preview',
'gpt-4-turbo-2024-04-09',
'gpt-4o-mini-2024-07-18',
'gpt-4o-2024-05-13',
'gpt-4o-2024-08-06',
Notice that some of those models are now deprecated. You can obtain the results on other models by using our code in full_processing_steps
or writing your own code starting from the final files (KindsOfReasoning.<csv|json>
), which contain all prompts and system prompts.
If you use KindsOfReasoning
, please cite the following paper:
@misc{pacchiardi2024100instancesneedpredicting,
title={100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances},
author={Lorenzo Pacchiardi and Lucy G. Cheke and José Hernández-Orallo},
year={2024},
eprint={2409.03563},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.03563},
}
The collection of datasets and accompanying code is released under the CC-BY-NC-SA 4.0 license. This requires attribution, prohibits commercial use, and mandates that any derivatives be shared under the same terms.
Part of the code in full_processing_steps/src/set_up_datasets.py
is adapted from https://github.com/stanford-crfm/helm, which is released under Apache License 2.0.
The individual datasets included in the KindsOfReasoning
collection are obtained from the following sources:
- Formal Fallacies Syllogisms Negation
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Logical_Args
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Babi_Task_16
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- LogiQA 2.0
- License: CC-BY-NC-SA 4.0
- Attribution: https://github.com/csitfun/LogiQA2.0
- WANLI
- License: We thank the author for having personally granted permission for the use of this dataset in this benchmark.
- Attribution: https://github.com/alisawuffles/wanli
- Alpha_NLI
- License: Apache License 2.0
- Attribution: © The Allen Institute for Artificial Intelligence https://leaderboard.allenai.org/anli/submissions/get-started
- ReClor
- License: We thank the author for having personally granted permission for the use of this dataset in this benchmark. ReClor is for non-commercial usage only
- Attribution: https://github.com/yuweihao/reclor
- Crass_AI
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Cause and Effect
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Fantasy Reasoning
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Goal Step Inference
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Copa
- License: BSD 2-Clause License
- Attribution: © M Roemmele, CA Bejan, AS Gordon. https://people.ict.usc.edu/~gordon/copa.html
- Cosmos_QA
- License: We thank the author for having personally granted permission for the use of this dataset in this benchmark.
- Attribution: https://github.com/wilburOne/cosmosqa
- Ropes
- License: CC-BY 4.0
- Attribution: © The Allen Institute for Artificial Intelligence Derived from https://huggingface.co/datasets/ropes
- ANLI
- License: CC-BY-NC 4.0
- Attribution: © Facebook, Inc. https://github.com/facebookresearch/anli
- Emoji_Movie
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Abstract Narrative Understanding
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Odd One Out
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Metaphor Understanding
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Geometric Shapes
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.
- Space_NLI
- License: MIT License
- Attribution: Copyright (c) 2023 Fibo Kowalsky https://github.com/kovvalsky/SpaceNLI
- Arithmetic
- License: Apache License 2.0
- Attribution: Derived from BIG-Bench.