AutoMathIC: Automatic Mathematic In-Context Example Generation for LLM Using Multi-Modal Consistency
This repository contains implementation source code and experimental results for automatic in-context example generation for advancing math-solving capability of LLM as described in the following paper:
Paper: Automatic Mathematic In-Context Example Generation for LLM Using Multi-Modal Consistency
AutoMathIC is a framework that automatically generates high-quality In- Context examples to enhance LLMs’ mathematical reasoning. In this implementation, AutoMathIC mutates an arithmatic questions and selects a subset of mutated questions for using it as In-Context examples using consistencies over multi-modalities over 4 math problem datasets(ASDiv, SVAMP, GSM8k and MultiArith). In this work, we use modality of Chain-Of-Thought, Code and Mathematical Equation. Results of the AutoMathIC is here. Supplemental artifacts for the results can be downloaded from here
This application is written for Python=3.9.17
. All requirements are listed in requirements.txt
, and they are installed by pip with the following command.
pip install -r requirements.txt
This artifact repository consists of the following files and folders:
./src/python/*
: Directory for source code in python
./_results/*
: Directory for results running the source code
./_downloads/*
: Directory for datasets used for running the source code
This step is to generate mutated math problems. The math problems are generated with the following command:
cd AuthoMathIC
# llm_model is between gpt3.5 for GPT-3.5 for and gpt4omini for GPT-4o-mini
# SVAMP
python -m src.python.main \
--run mutate_nl \
--llm_model_name "${llm_model}" \
--dataset_name 'svamp'
# ASDiv
python -m src.python.main \
--run mutate_nl \
--llm_model_name "${llm_model}" \
--dataset_name 'asdiv'
# MultiArith
python -m src.python.main \
--run mutate_nl \
--llm_model_name "${llm_model}" \
--dataset_name 'multiarith'
# GSM8k
python -m src.python.main \
--run mutate_nl \
--llm_model_name "${llm_model}" \
--dataset_name 'gsm8k'
Output after running the command are in the result directories of {PROJ_DIR}/_results/nl2nl/{DATASET}/mutation/
where DATASET
is the name of math problem dataset among ASDiv, SVAMP, GSM8k and MultiArith.
For the task and its result directory, the following files are generated:
_results/
|- nl2nl/
| |- {DATASET}}/
| | |- mutation/
| | | |- mut-nl-{CKSUM}.json
Where {CKSUM}
represents the checksum value of each unique math problem.
The mut-nl-{CKSUM}.json
contains original math problem and its mutated math problems.
This step is to obtain the LLM responses over multiple modalities. You can run it by executing the following command:
cd AuthoMathIC
# llm_model is between gpt3.5 for GPT-3.5 for and gpt4omini for GPT-4o-mini
# SVAMP
python -m src.python.main \
--run evaluate_mm_llm \
--llm_model_name "${llm_model}" \
--dataset_name 'svamp'
# ASDiv
python -m src.python.main \
--run evaluate_mm_llm \
--llm_model_name "${llm_model}" \
--dataset_name 'asdiv'
# MultiArith
python -m src.python.main \
--run evaluate_mm_llm \
--llm_model_name "${llm_model}" \
--dataset_name 'multiarith'
# GSM8k
python -m src.python.main \
--run evaluate_mm_llm \
--llm_model_name "${llm_model}" \
--dataset_name 'gsm8k'
Output after running the command are in the result directories of {PROJ_DIR}/_results/nl2nl/{DATASET}/evaluate_consistency/{LLM_MODEL}
where DATASET
is the name of math problem dataset among ASDiv, SVAMP, GSM8k and MultiArith and LLM_MODEL represents the name of LLMs.
For the task and its result directory, the following files are generated:
_results/
|- nl2nl/
| |- {DATASET}}/
| | |- evaluate_consistency/
| | | |- {LLM_MODEL}/
| | | | |- fg-eval-{CKSUM}.json
| | | | |- eval-{CKSUM}.json
Where {CKSUM}
represents the checksum value of each unique math problem.
The eval-{CKSUM}.json
and fg-eval-{CKSUM}.json
contains LLM responses for original math problem and its mutated math problems over different modalities.
This step is to select In-Context Examples among the mutated questions and generate the LLM responses using them. You can run it by executing the following command:
cd AuthoMathIC
# llm_model is between gpt3.5 for GPT-3.5 for and gpt4omini for GPT-4o-mini
# SVAMP
python -m src.python.main \
--run genetic_fg_alg \
--llm_model_name "${llm_model}" \
--dataset_name 'svamp'
# ASDiv
python -m src.python.main \
--run genetic_fg_alg \
--llm_model_name "${llm_model}" \
--dataset_name 'asdiv'
# MultiArith
python -m src.python.main \
--run genetic_fg_alg \
--llm_model_name "${llm_model}" \
--dataset_name 'multiarith'
# GSM8k
python -m src.python.main \
--run genetic_fg_alg \
--llm_model_name "${llm_model}" \
--dataset_name 'gsm8k'
Output after running the command are in the result directories of
{PROJ_DIR}/_results/genetic_fg/{DATASET}/evaluate_consistency/{LLM_MODEL}
where DATASET
is the name of math problem dataset among ASDiv, SVAMP, GSM8k and MultiArith and LLM_MODEL represents the name of LLMs.
For the task and its result directory, the following files are generated:
_results/
|- genetic_fg/
| |- {DATASET}}/
| | |- evaluate_consistency/
| | | |- {LLM_MODEL}/
| | | | |- final_answers.json
Where {CKSUM}
represents the checksum value of each unique math problem.
The final_answers.json
contains final LLM responses for original math problems using selected mutations as In-Context examples for the original math problems.
Supplemental artifacts for the results can be downloaded from here