Proofs written in Lean4 for the core katydid validation algorithm
The goal is to formalize the core katydid validation algorithm. This algorithm allows us to validate millions of serialized data structures per second on a single core. The algorithm is based on derivatives for regular expressions and extends this to Visibly Pushdown Automata (VPA), by splitting the derivative function into two functions. It also includes several basic optimizations, such as memoization, simplification, laziness, zipping of multiple states, short circuiting, evaluation at compilation, symbolic derivatives and a pull based parser for serialized data structures that allows us to skip over some of the parsing. You can play around with the validation language on its playground.
This is just a quick overview of the steps towards our goal.
Prove theorems about Symbolic regular expressions as a foundation to build upon.
- Prove correctness of derivative algorithm via a commuting diagram.
- Prove correctness of derivative algorithm via a Regex type indexed with Language.
- Prove decidability of derivative algorithm.
- Prove correctness of simplification rules.
- Prove correctness of smart constructors.
Reuse as much as we can from our previous work in Coq and our attempt at Reproving Agda in Lean
- Create expression language as described in the post: Derivatives of Symbolic Automata explained
- Prove correctness of simplification rules for
or
,and
,false
, etc. - Prove that non-reader functions can be pre-computed before evaluating time
- Prove that the optimized comparison method using a hash is comparable (transitive, associative, etc.)
- Create Language definition for the symbolic tree expressions.
- Code Pull-based Parser class in Lean and implement JSON as an example.
- Code Katydid algorithm in Lean.
- Prove correctness of derivative tree algorithm.
- Prove decidablity of derivative tree algorithm.
- Prove that the simple tree function and the VPA functions are equivalent and equivalent to the inductive predicate.
- Prove correctness of new simplification rules
- Prove all optimizations of the katydid algorithm
Please check the prerequisites and read the contributing guidelines. The contributing guidelines are short and shouldn't be surprising.
Contributing to this repo requires an understanding the underlying algorithm that is the subject of the proofs in this repo:
- Derivatives of Regular Expressions explained
- Derivatives of Context-Free Grammars explained (only the simplification rules, smart constructors and memoization are important)
- Derivatives of Symbolic Automata explained
This repo also requires an understanding of proof assistants, since all the proofs in this repo are done using LeanProver:
- Knowledge of dependent types, induction and understanding the difference between a property
True
and a booleantrue
. We recommend reading The Little Typer to gain an understanding of the basics. - Experience with an Interactive Theorem Prover, like Coq or Lean, including using tactics and Inductive Predicates. If you are unfamiliar with interactive theorem provers you can watch our talk for a taste. We recommend reading Coq in a Hurry as a quick overview and Coq Art up to
Chapter 8: Inductive Predicates
for a proper understanding.
Optionally the following will also be helpful, but this is not required:
- Experience with Lean4, since this project is written in Lean4. We recommend reading:
- Theorem Proving in Lean4 to close the gap between Coq and Lean.
- Lean Manual for programming in Lean and Monads.
- Lean Tactics File
- Coq Lean Tactic Cheat Sheet
- Lean Standard Libary Documentation
- Lean4 Meta Programming Book
- Tactic List
Questions about Lean4 can be asked on proofassistants.stackexchange by tagging questions with lean
and lean4
or in the Zulip Chat.
- Lean4 has exceptional instructions for installing Lean4 in VS Code.
- Remember to also add
lake
(the build system for lean) to yourPATH
. You can do this on mac by addingexport PATH=~/.elan/bin/:${PATH}
to your~/.zshrc
file - Use mathlib's cache to speed up building time by running:
$ lake exe cache get