Since the original paper does not have code for the model, I’ve replicated the process (without pretraining on WebMath) with changes for training & evaluation on a single consumer GPU NVIDIA RTX 2080 TI) and made the finetuned model publicly available.
The setups are similar to the publicly available code
Make sure to install elan: https://www.github.com/kha/elan
git clone [email protected]:sontungtran/pact-lean-low-resource.git $PACT_DIR
cd $PACT_DIR
leanpkg configure
leanproject get-mathlib-cache
bash ./_target/deps/mathlib/scripts/mk_all.sh # make a file that imports all of mathlib
leanpkg build
Download dataset from https://drive.google.com/file/d/1VJ4kl1VaC3DfjqdP3KLR4zIoPRfv-K1B/view?usp=sharing
Move file to directory $PACT_DIR/notebooks/data
Download finetuned model from https://drive.google.com/file/d/1OVvHv-WPY7Oz_uum7ofV1yeto0sY-DrY/view?usp=sharing
Move file to directory $PACT_DIR/notebooks/models
Simply run the GPT2_ProofStep_only.ipynb
notebook, the code will try to find the finetuned model and train it further
The Lean scripts will dump JSON messages to a specified file, with a new message on each line. Each message encodes the history of a single proof search.
WARNING: Make sure arguments are passed in correct order.
# bfs strategy only for now lean --run ./src/evaluation/bfs/baseline.lean ./test/dummy_decls2.names ./test/dummy_out_bfs_baseline.log 50 25 50 5 300 > ./test/dummy_out_bfs_baseline.out 2>&1 # this should be 31 cat ./test/dummy_out_bfs_baseline.log | grep | wc -l # this should be 0 (TODO(jesse): allow optional environment-setting) cat ./test/dummy_out_bfs_baseline.log | grep '"success":true' | wc -l
# bfs strategy only for now # args: decl_names logfile max_tokens temp top_p n best_of fuel max_width max_depth engine_id api_key lean --run ./src/evaluation/bfs/gptf.lean ./test/dummy_decls2.names ./test/dummy_out_bfs_gptf.log 256 0.7 1.0 10 10 50 10 25 formal-lean-wm-to-tt-m1-m2-v4-c4 $OPENAI_API_KEY 5 300 > ./test/dummy_out_bfs_gptf.out 2>&1 # should be 31 cat ./test/dummy_out_bfs_gptf.log | grep | wc -l # should be ~24 cat ./test/dummy_out_bfs_gptf.log | grep '"success":true' | wc -l
The Python script wraps the Lean scripts and runs them in parallel, sharding a list of theorems.
usage: parallel_eval.py [-h] [--max_tokens MAX_TOKENS]
[--temperature TEMPERATURE] [--top_p TOP_P] [--n N]
[--best_of BEST_OF] [--fuel FUEL]
[--engine_id ENGINE_ID] [--api_key API_KEY]
[--nbest NBEST] [--beam BEAM]
[--model_path MODEL_PATH] [--data_path DATA_PATH]
[--entry_pt ENTRY_PT] [--max_width MAX_WIDTH]
[--max_depth MAX_DEPTH] [--lean_threads LEAN_THREADS]
[--lean_timeout LEAN_TIMEOUT] [--api API]
[--search_strategy SEARCH_STRATEGY]
[--tac_timeout TAC_TIMEOUT]
[--global_timeout GLOBAL_TIMEOUT]
n_workers decls_per_shard decls_file dest_dir
positional arguments:
n_workers
decls_per_shard
decls_file
dest_dir
optional arguments:
-h, --help show this help message and exit
--max_tokens MAX_TOKENS
--temperature TEMPERATURE
--top_p TOP_P
--n N
--best_of BEST_OF
--fuel FUEL
--engine_id ENGINE_ID
--api_key API_KEY
--nbest NBEST
--beam BEAM
--model_path MODEL_PATH
--data_path DATA_PATH
--entry_pt ENTRY_PT
--max_width MAX_WIDTH
maximum size of search queue for BFS
--max_depth MAX_DEPTH
maximum distance of search node from root before the
search queue rejects it
--lean_threads LEAN_THREADS
number of threads per Lean process
--lean_timeout LEAN_TIMEOUT
deterministic timeout for Lean process in millions of
allocations. Interactive default is one. Default is
unbounded (none).
--api API gptf|baseline|fairseq
--search_strategy SEARCH_STRATEGY
greedy|bfs
--tac_timeout TAC_TIMEOUT
tactic execution timeout (s)
--global_timeout GLOBAL_TIMEOUT
proof search timeout (s)
Check that the baseline version works. Inspect some of the output files.
python ./scripts/parallel_eval.py 4 8 ./test/dummy_decls2.names ./test_parallel/baseline/ --fuel 50 --api baseline --search_strategy bfs --tac_timeout 5 --global_timeout 300
Check that the BFS GPT-f version works. Inspect some of the output files.
python ./scripts/parallel_eval_resumeable_json.py 1 1 ./test/dummy_decls_single.names ./test_parallel/gptf_neo_dummy/ --max_tokens 256 --temperature 0.7 --top_p 1.0 --n 40 --best_of 40 --fuel 200 --max_width 32 --max_depth 128 --engine_id formal-lean-wm-to-tt-m1-m2-v4-c4 --api_key asdfasd --api gptf_neo_8epoch_modified --search_strategy bfs --tac_timeout 5 --global_timeout 300000
lean --run ./src/tools/filter_defs.lean $ORIGINAL_NAMES_FILE $NEW_NAMES_FILE
python ./scripts/shuffle_lines.py $NAMES_FILE $SHUFFLED_NAMES_FILE # optional seed -- seed 12387
This code is based on Lean Theorem Proving Environment repo, LeanStep dataset repo and adopted GPT-2 from scratch
@article{han2021proof, title={Proof artifact co-training for theorem proving with language models}, author={Han, Jesse Michael and Rute, Jason and Wu, Yuhuai and Ayers, Edward W and Polu, Stanislas}, journal={arXiv preprint arXiv:2102.06203}, year={2021} }