GitHub - amazon-science/incremental-parsing: Incremental Python parser for constrained generation of code by LLMs.

Incremental Parser

This repository contains code to perform incremental parsing for Python for constrained generation of code by LLMs. It is a reference implementation of the following paper and please cite it if you find the repo useful.

"Constrained Decoding for Fill-in-the-Middle Code Language Models via Efficient Left and Right Quotienting of Context-Sensitive Grammars," Daniel Melcer, Nathan Fulton, Sanjay Krishna Gouda and Haifeng Qian, arXiv:2402.17988, 2024.

Installation

For exact reproducibility, we are using CUDA version 12.0, driver version 525.85.12, on a A100 GPU. Python version 3.9.19, and the Rust toolchain version 1.79.0. We use the version of Santacoder that was released on December 20, 2022.

Install everything in requirements.txt.
Run maturin develop --release in your virtual environment.
- Note that IDEs sometimes have trouble with automatic code completion for this. As long as you get the message Installed incremental_parsing_rust-0.1.0, the library is installed in the virtual environment.
Edit scripts/constrained_generation_{random/nice}_cuts.sh so that results_path is an absolute path. The program will create a (large) folder at this path. Also, edit the loop max, device name, and min/max data indices so that it fits your hardware and eventually loops through data indices 0 through 9999.
PYTHONPATH=. scripts/constrained_generation_random_cuts.sh
- Read the documentation for hapless (hap --help) for information about process management
When done, edit the source path at the bottom of incremental_parsing/evaluation/evaluate_stack.py to match the results path. Edit the destination path to be somewhere you want a csv file to be created.
Import the csv file into a sqlite table named stack, and then use incremental_parsing/evaluation/gen_tables.sql to obtain the numbers from the paper.

You can also use the following interactive scripts in the notebooks directory:

create_parse_hierarchy_viz_python.ipynb creates a parse hierarchy from left and right contexts, and outputs a visualization of this. Note that there might be multiple active branches with different parse hierarchies; the visualizer requires you to select one branch.
- create_parse_hierarchy_viz_calc_lang.ipynb is the same for a much simpler language, a calculator language with tuples. It is significantly easier to inspect the output and understand what is going on here.
interactive_constrained_generation.ipynb generates code, and shows all the left contexts which are considered to be a member of the quotient language, plus their scores from the LLM.
interactive_recognition.py lets you type and see whether some text is in the quotient language, is incrementally parsable, or cannot be a prefix of a member of the quotient language.
paper_examples.ipynb reproduces code generation examples.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
grammars		grammars
incremental_parsing		incremental_parsing
notebooks		notebooks
scripts		scripts
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
THIRD-PARTY.md		THIRD-PARTY.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Incremental Parser

Installation

Security

License

About

Releases

Packages

Contributors 2

Languages

License

amazon-science/incremental-parsing

Folders and files

Latest commit

History

Repository files navigation

Incremental Parser

Installation

Security

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages