Merge branch 'main' into feat/batched-opening-proofs

a16z · Sep 17, 2024 · 5b9c10a · 5b9c10a
2 parents 519ac74 + 33c737e
commit 5b9c10a
Show file tree

Hide file tree

Showing 67 changed files with 2,242 additions and 484 deletions.
diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
@@ -19,6 +19,7 @@
             - [Bytecode](./how/bytecode.md)
             - [R1CS constraints](./how/r1cs_constraints.md)
         - [M extension](./how/m-extension.md)
+        - [Sparse constraint systems](./how/sparse-constraint-systems.md) 
     - [Background](./background.md)
         - [Sumcheck](./background/sumcheck.md)
         - [Multilinear Extensions](./background/multilinear-extensions.md)
@@ -32,8 +33,10 @@
         - [Install](./dev/install.md)
         - [Tools](./dev/tools.md)
     - [Roadmap](./tasks.md)
+        - [Improvements to Jolt since initial release](./future/improvements-since-release.md)
         - [Optimizations](./future/opts.md)
         - [Zero Knowledge](./future/zk.md)
+        - [On-chain verifier (no recursion/composition)](./future/on-chain-verifier.md)
         - [Groth16 Recursion](./future/groth-16.md)
         - [Precompiles](./future/precompiles.md)
-        - [Continuations](./future/continuations.md)
+        - [Prover space control](./future/continuations.md)
diff --git a/book/src/background/binius.md b/book/src/background/binius.md
@@ -5,7 +5,7 @@ The Binius paper also gives a number of sum-check-based polynomial IOPs for to b
 
 Here is a brief summary of changes that will occur in Jolt in order to incorporate the Binius commitment scheme:
 
-*Addition and multiplication instructions will be handled via gadgets/constraint-systems instead of lookups. We may want to handle XOR and LT via gadgets too for performance reasons. 
+*Addition and multiplication instructions will be handled via [gadgets/constraint-systems](https://www.irreducible.com/posts/integer-multiplication-in-binius) instead of lookups. We may want to handle XOR and LT via gadgets too for performance reasons. 
 
 *Booleanity constraints in the R1CS can be omitted as that is guaranteed by the binding properties of the Binius commitment scheme.
 

diff --git a/book/src/future/continuations.md b/book/src/future/continuations.md
@@ -1,10 +1,51 @@
-# Continuations via Chunking
-Today Jolt is a monolithic SNARK. RISC-V traces cannot be broken up, they must be proven monolithically or not at all. As a result, Jolt has a fixed maximum trace length that can be proved which is a function of the available RAM on the prover machine. Long term we'd like to solve this by implementing a streaming version of Jolt's prover such that prover RAM usage is tunable with minimal performance loss. *TODO(sragss): Streaming prover link*
+# Prover space control
 
-Short term we're going to solve this via monolithic chunking. The plan: Take a trace of length $N$ split it into $M$ chunks of size $N/M$ and prove each independently. $N/M$ is a function of the max RAM available to the prover. 
+There is a major impediment to Jolt’s usefulness today: it consumes large amounts of memory, which limits the length of RISC-V executions that it can prove. Proving about 16 million RISC-V cycles consumes hundreds of GBs of space.
 
-For the direct on-chain verifier there will be a cost linear in $M$ to verify. If we wrap this in Groth16 our cost will become constant but the Groth16 prover time will be linear in $M$. We believe this short-term trade-off is worthwhile for usability until we can implement the (more complicated) streaming algorithms. 
+There are two known techniques for reducing the prover space in zkVMs. One is called “continuations” and inherently involves proof recursion. This is the approach taken by almost all zkVM projects today. The other technique is non-recursive: it is well-known that sum-check-based SNARKs in particular do not require recursion to keep the prover's space under control.  Nonetheless, very few projects are pursuing this non-recursive approach (the only one we are aware of is [Ligetron](https://www.computer.org/csdl/proceedings-article/sp/2024/313000a086/1RjEaU3iZEY), which is not sum-check-based). We call these non-recursive techniques "the streaming prover".
+
+Jolt will pursue both approaches to prover space control. Below, we provide more details on the two approaches.
+
+# More detail on continuations
+
+Continuations work by breaking a large computation into “chunks”, proving each chunk (almost) independently, and recursively aggregating the proofs (i.e., proving one knows the proofs for each chunk). 
+
+Continuations come in two flavors: "brute-force recursion" and folding. In brute-force recursion, the proofs for different chunks are aggregated by having the prover prove it knows each proof. Roughly speaking, the verifier of each proof is repesented as a circuit, and a SNARK is used to prove that the prover knows a satisfying assignment for each circuit. 
+
+In folding schemes, the "proof" for each chunk is actually only a "partial proof", in particular omitting any evaluation proofs for any committed polynomials. The partial proofs for each chunk are not explicitly checked by anyone. Rather, they are "aggregated" into a single partial proof, and that partial proof is then "completed" into a full SNARK proof. In folding schemes, the prover winds up recursively proving that it correctly aggregated the partial proofs. This has major performance benefits relative to "brute force recursion", because aggregating proofs is much simpler than checking them. Hence, proving aggregation was done correctly is much cheaper than proving full-fledged proof verification was done correctly.
+
+Folding schemes only work with elliptic-curve-based commitment schemes (there have been some efforts to extend folding techniques to hashing-based commitment schemes but they introduce large overheads). So most zkVMs do not pursue folding today as they use hashing-based commitment schemes. Fortunately, Jolt today does use elliptic curves and hence folding as a means of prover space control is available to it.
+
+# More detail on non-recursive space control
+
+Brute-force recursion has the following downsides.
+
+First, recursive proof aggregation imposes a lower bound of several GBs on prover space, because this is how much space is required to represent a SNARK verifier as a circuit and store that circuit. Second, the chunks are not completely independent, because one has to make sure that the state of the VM’s memory at the end of chunk $i$ equals the state of the VM’s memory at the start of chunk $i+1$. Ensuring this is an engineering headache and adds some performance overhead. Third, recursion actually changes the statement being verified. No longer is the statement “I correctly ran the computer program on a witness and it produced a claimed output $y$”. Rather it is: “I know several proofs, one per chunk, each attesting that the relevant chunk was run correctly, and with the final chunk outputting $y$.” Consequently, once proof recursion is used, it can actually be quite challenging for the public to tell that the right statement is being verified. 
+
+Accordingly, we expect the non-recursive approach to space control to lead to a prover with smaller space than the continuations approach, and to a much simpler system overall. 
+
+The non-recursive approach will also place the security of the SNARK on firmer footing: security of recursive SNARKs is heuristic, while non-recursive SNARKs like Jolt itself have unconditional security in idealized models such as the Random Oracle model (perhaps with additional assumptions such as hardness of discrete logarithms, depending on the choice of commitment scheme to use in Jolt). 
+
+Crucial to non-recursive space control is that Lasso, the lookup argument used in Jolt, does not require any sorting. 
+
+# Planned partial steps toward continuations
+As discussed above, today Jolt is a monolithic SNARK. RISC-V traces cannot be broken up, they must be proven monolithically or not at all. As a result, Jolt has a fixed maximum trace length that can be proved which is a function of the available RAM on the prover machine. Long term we'd like to solve this by implementing a streaming version of Jolt's prover such that prover RAM usage is tunable with minimal performance loss. 
+
+Short term we're going to solve this via a very primitive form of continuations we call monolithic chunking. The plan: Take a trace of length $N$ split it into $M$ chunks of size $N/M$ and prove each independently. $N/M$ is a function of the max RAM available to the prover. 
+
+For the direct on-chain verifier there will be a cost linear in $M$ to verify. We believe this short-term trade-off is worthwhile for usability until we can implement the (more complicated) streaming algorithms. 
 
 ## Specifics
 A generic config parameter will be added to the `Jolt` struct called `ContinuationConfig`. At the highest level, before calling `Jolt::prove` the trace will be split into `M` chunks. `Jolt::prove` will be called on each and return `RAM_final` which can be fed into `RAM_init` during the next iteration of `Jolt::prove`. The [output zerocheck](https://jolt.a16zcrypto.com/how/read_write_memory.html#ouputs-and-panic) will only be run for the final chunk. 
 
+# References on non-recursive prover space control
+
+*Verifiable computation using multiple provers. Andrew J. Blumberg, Justin Thaler, Victor Vu, and Michael Walfish. https://eprint.iacr.org/2014/846.
+
+*Public-coin zero-knowledge arguments with (almost) minimal time and space overheads. AR Block, J Holmgren, A Rosen, RD Rothblum, P Soni. Theory of Cryptography Conference, 168-197.
+
+*Gemini: Elastic SNARKs for diverse environments J Bootle, A Chiesa, Y Hu, M Orru. EUROCRYPT 2022
+
+*On black-box constructions of time and space efficient sublinear arguments from symmetric-key primitives. Laasya Bangalore, Rishabh Bhadauria, Carmit Hazay & Muthuramakrishnan Venkitasubramaniam. TCC 2022.
+
+*Ligetron: Lightweight Scalable End-to-End Zero-Knowledge Proofs. Post-Quantum ZK-SNARKs on a Browser. Ruihan Wang  Carmit Hazay  Muthuramakrishnan Venkitasubramaniam. 2024 IEEE Symposium on Security and Privacy (SP).
diff --git a/book/src/future/groth-16.md b/book/src/future/groth-16.md
@@ -1,9 +1,65 @@
 # Groth16 Recursion
-Jolt's verifier today is expensive. We estimate 1-5 million gas to verify within the EVM. Better batching techniques on opening proofs can bring this down 5-10x, but it will remain expensive. Further, the short-term [continuations plan](https://jolt.a16zcrypto.com/future/continuations.html) causes a linear blowup depending on the count of monolithic trace chunks proven.
+Jolt's verifier with no proof composition is expensive. We [estimate](on-chain-verifier.md) about 2 million gas to verify within the EVM. Further, the short-term [continuations plan](https://jolt.a16zcrypto.com/future/continuations.html) causes a linear blowup depending on the count of monolithic trace chunks proven.
 
-To solve these two issues we're aiming to add a configuration option to the Jolt prover with a post processing step, which creates a Groth16 proof of the Jolt verifier for constant proof size / cost (~280k gas on EVM) regardless of continuation chunk count or opening proof cost. This technique is industry standard.
+To solve these two issues we're aiming to add a configuration option to the Jolt prover with a post processing step, which creates a Groth16 proof of the Jolt verifier for constant proof size / cost (~280k gas on EVM) regardless of continuation chunk count or opening proof cost. This technique is industry standard. (An added benefit of this proof composition is that it will render 
+Jolt proofs [zero-knowledge](zk.md)). 
 
-## Strategy
+We call directly representing the Jolt verifier (with HyperKZG polynomial commitment scheme) 
+as constraints to then feeding those constraints into Groth16 "naive composition". Unfortunately, this naive procedure
+will result 
+in over a hundred millions of constraints. Applying Groth16 
+to such a large constraint system will result in far more latency than we'd lik, (and may even be impossible over the BN254 scalar field
+because that field only supports FFTs of length $2^{27}$. 
+Below, we describe alternate ways forward. 
+
+# Cost accounting for naive composition
+
+In naive composition, the bulk of the constraints are devoted
+to implementing the [~150 scalar multiplications done by the Jolt verifier](on-chain-verifier.md) 
+(or more accurately, ~150 scalar multiplications will be the dominant verifier cost once we finish all optimizations that
+are now in progress). 
+
+Each scalar multiplication costs about 400 group operations, each of which costs about 10 field multiplications,
+so that's about $150 \cdot 400 \cdot 10=600k$ field multiplications.
+The real killer is that these field multiplications must be done non-natively in constraints, due to the fact
+that that BN254 does not have a pairing-friendly  "sister curve" (i.e., a curve whose scalar field matches the BN254 base field).
+This means that each of the $600k$ field multiplications costs thousands of constraints. 
+
+On top of the above, the two pairings done by the HyperKZG verifier, implemented non-natively in constraints, 
+should cost about 6 million constraints. 
+
+We do expect advances in representing non-native field arithmetic in constraints to bring the above numbers
+down some, but not nearly enough to make naive composition sensible. 
+
+# A first step to reducing gas costs
+
+A simple first approach to reducing gas costs is to leave all scalar multiplications and HyperKZG-evaluation-proof-checking done by the Jolt verifier out of the composition, doing those directly on-chain. 
+
+This would reduce gas costs to slightly over 1 million, and ensure that Groth16 is run only on a few million constraints
+(perhaps even less, given upcoming advances in representing non-native field arithmetic in constraints).
+
+# A potential way forward 
+
+Here is a first cut at a plan to bring on-chain Jolt verification down to a couple hundred thousand gas.
+
+First, represent the Jolt verifier (with HyperKZG-over-BN254 as the polynomial commitment scheme) as an R1CS instance over the Grumpkin scalar field, and apply Spartan (with Hyrax-over-Grumpkin as the polynomial commitment) to this R1CS. This R1CS should only have a few million constraints. This is because all scalar multiplications and pairings done by the Jolt verifier are native over Grumpkin's scalar field. (The constraint system is also highly uniform, which can benefit both the Spartan prover and verifier)
+
+The field operations
+done by the Jolt verifier in the various invocations of the sum-check protocol need to be represented non-natively in these R1CS constraints, but since there are only about 2,000 such field operations they still only cost perhaps 5 million constraints in total. See the [Testudo](https://eprint.iacr.org/2023/961) paper for a similar approach and associated calculations. 
+
+We expect upcoming advances in 
+methods for addressing non-native field arithmetic (and/or more careful optimizations of the Jolt verifier) to bring this down to under 2 million constraints. 
+
+But the Spartan proof is still too big to post on-chain. So, second, represent the Spartan verifier as an R1CS instance over the BN254 scalar field, and apply Groth16 to this R1CS. This the proof
+posted and verified on-chain. 
+
+The Spartan verifier only does at most a couple of hundred field operations (since there's only two sum-check invocations in Spartan) and $2 \cdot \sqrt{n}$ scalar multiplications where $n$ is the number of columns (i.e., witness variables) in the R1CS instance.
+$2 \sqrt{n}$ here will be on the order of 3,000. Each scalar multiplication (which are done natively in this R1CS) yields 
+about $4,000$ constraints. So that's 12 million constraints being fed to Groth16.
+
+The calculation above is quite delicate, because running Groth16 on 12 million constraints would be okay for many settings, but 50 million constraints would not be. We do think 12 million is about the right estimate for this approach, especially given that 2,000 field operations for the Jolt verifier is a conservative estimate. 
+
+## Details of the Jolt-with-HyperKZG verifier
 The easiest way to understand the workload of the verifier circuit is to jump through the codebase starting at `vm/mod.rs Jolt::verify(...)`.  Verification can be split into 4 logical modules: [instruction lookups](https://jolt.a16zcrypto.com/how/instruction_lookups.html), [read-write memory](https://jolt.a16zcrypto.com/how/read_write_memory.html), [bytecode](https://jolt.a16zcrypto.com/how/bytecode.html), [r1cs](https://jolt.a16zcrypto.com/how/r1cs_constraints.html).
 
 Each of the modules do some combination of the following:
@@ -57,4 +113,4 @@ Suggested plan of attack:
 2. Circuit for batched HyperKZG verification
 3. Circuit for sumcheck verification
 4. Circuit for instruction/subtable MLE evaluations
-5. Circuit for Spartan verification
+5. Circuit for Spartan verification
diff --git a/book/src/future/improvements-since-release.md b/book/src/future/improvements-since-release.md
@@ -0,0 +1,40 @@
+# Improvements complete or in progress since the initial release of Jolt in April 2024
+
+There are many tasks described in the various files in the "roadmap/future" section of the wiki. 
+Let's use this file as the primary means of tracking what is done or in progress. Anything
+not mentioned in this file is presumably not yet started (or barely started). 
+
+## Functionality improvements
+
+* Support for stdlib
+
+* Support for M-extension.
+
+* In progress: on-chain verifier (Solidity). 
+
+## Verifier cost improvements
+
+* Add support for HyperKZG commitment
+
+* Add support for Quarks/Spartan grand product argument and hybrid grand product 
+(which achieves most of the verifier benefits of Quarks without a significant hit to prover time).
+
+  * Still to do here: [Optimize](https://github.com/a16z/jolt/issues/444) to how the prover commits to partial products. 
+
+* In progress: change how we are batching grand products, treating them all laid side-by-side as one giant circuit. This
+will reduce proof size by up to 200KB.
+
+* In progress: reduce the number of polynomial evaluation proofs from 7-10 down to 1. 
+
+## Prover cost improvements (all in progress)
+
+* Eliminate cost of pre-computed tables of eq evaluations for each sum-check,
+as per [Dao-Thaler](https://eprint.iacr.org/2024/1210).
+
+* (Nearly) [eliminate](https://github.com/a16z/jolt/issues/347) second sum-check instance in Spartan.
+
+* Implement the sum-check prover optimization from Section 3 of Angus Gruen's [paper](https://eprint.iacr.org/2024/108).
+
+* AVX-512 speedups.
+
+* GPU integration.