Move documentation to Github Pages (#557)

* Migrate to github pages * workflow * hyperlinks * Move * Tweak * Remove Rustdoc * Update doc * Remove TODO in workflow
0xPolygonZero · Aug 28, 2024 · 4ff1194 · 4ff1194
1 parent 00c759a
commit 4ff1194
Show file tree

Hide file tree

Showing 50 changed files with 1,953 additions and 1,185 deletions.
diff --git a/.github/workflows/book.yml b/.github/workflows/book.yml
@@ -0,0 +1,36 @@
+name: zkEVM mdbook
+
+on:
+  push:
+    branches: [develop, main]
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - uses: actions-rs/toolchain@v1
+        with:
+          toolchain: nightly
+          override: true
+
+      - name: Install mdbook
+        uses: actions-rs/cargo@v1
+        with:
+          command: install
+          args: mdbook
+
+      - name: Install mdbook-katex and mdbook-bib
+        uses: actions-rs/cargo@v1
+        with:
+          command: install
+          args: mdbook-katex mdbook-bib
+
+      - name: Build book
+        run: mdbook build book
+
+      - name: Deploy to GitHub Pages
+        uses: peaceiris/actions-gh-pages@v3
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: ./book/book
diff --git a/README.md b/README.md
@@ -60,11 +60,10 @@ flowchart LR
 
 ## Documentation
 
-Documentation is still incomplete and will be improved over time, a lot of useful material can
-be found in the [docs](./docs/) section, including:
-
-* [sequence diagrams](./docs/usage_seq_diagrams.md) for the proof generation flow
-* [zkEVM specifications](./docs/arithmetization/zkevm.pdf), detailing the underlying EVM proving statement
+Documentation is still incomplete and will be improved over time.
+You can look at the [sequence diagrams](./docs/usage_seq_diagrams.md) for the proof generation flow,
+or go through the [zkEVM book](https://0xpolygonzero.github.io/zk_evm/)
+for explanations on the zkEVM design and architecture.
 
 ## Branches
 The default branch for the repo is the `develop` branch which is not stable but under active development. Most PRs should target `develop`. If you need a stable branch then a tagged version of `main` is what you're after.

diff --git a/book/.gitignore b/book/.gitignore
@@ -0,0 +1 @@
+book
diff --git a/book/book.toml b/book/book.toml
@@ -0,0 +1,30 @@
+[book]
+authors = ["Polygon Zero"]
+language = "en"
+multilingual = false
+src = "src"
+title = "The Polygon Zero zkEVM Book"
+
+[rust]
+edition = "2021"
+
+[build]
+create-missing = true
+
+[preprocessor.index]
+
+[preprocessor.links]
+
+[preprocessor.katex]
+
+[preprocessor.bib]
+bibliography = "bibliography.bib"
+
+[output.html]
+
+[output.html.print]
+# Disable page break
+page-break = false
+
+[output.html.search]
+limit-results = 15
diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
@@ -0,0 +1,30 @@
+# Summary
+
+[Introduction](intro.md)
+
+- [STARK framework](framework/intro.md)
+    - [Cost model](framework/cost_model.md)
+    - [Field](framework/field.md)
+    - [Cross-Table Lookups](framework/ctls.md)
+    - [Range-Checks](framework/range_check.md)
+- [Tables](tables/intro.md)
+    - [CPU](tables/cpu.md)
+    - [Arithmetic](tables/arithmetic.md)
+    - [BytePacking](tables/byte_packing.md)
+    - [Keccak](tables/keccak.md)
+    - [KeccakSponge](tables/keccak_sponge.md)
+    - [Logic](tables/logic.md)
+    - [Memory](tables/memory.md)
+    - [MemBefore & MemAfter](tables/mem_continuations.md)
+- [Merkle Patricia Tries](mpt/intro.md)
+    - [Memory format](mpt/memory_format.md)
+    - [Prover input format](mpt/prover_input_format.md)
+    - [Encoding and hashing](mpt/encoding_hashing.md)
+    - [Linked lists](mpt/linked_lists.md)
+- [CPU Execution](cpu_execution/intro.md)
+    - [Kernel](cpu_execution/kernel.md)
+    - [Opcodes & Syscalls](cpu_execution/opcodes_syscalls.md)
+    - [Privileged Instructions](cpu_execution/privileged_instructions.md)
+    - [Stack handling](cpu_execution/stack_handling.md)
+    - [Gas handling](cpu_execution/gas_handling.md)
+    - [Exceptions](cpu_execution/exceptions.md)
diff --git a/docs/arithmetization/bibliography.bib → book/src/bibliography.bib b/docs/arithmetization/bibliography.bib → book/src/bibliography.bib
@@ -6,7 +6,7 @@ @misc{stark
     title        = {Scalable, transparent, and post-quantum secure computational integrity},
     howpublished = {Cryptology ePrint Archive, Report 2018/046},
     year         = {2018},
-    note         = {\url{https://ia.cr/2018/046}},
+    url         = {https://ia.cr/2018/046},
 }
 
 @misc{plonk,
@@ -16,7 +16,7 @@ @misc{plonk
     title        = {PLONK: Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge},
     howpublished = {Cryptology ePrint Archive, Report 2019/953},
     year         = {2019},
-    note         = {\url{https://ia.cr/2019/953}},
+    url         = {https://ia.cr/2019/953},
 }
 
 @article{yellowpaper,
@@ -27,4 +27,4 @@ @article{yellowpaper
 	number={2014},
 	pages={1--32},
 	year={2014}
-}
+}
diff --git a/book/src/cpu_execution/exceptions.md b/book/src/cpu_execution/exceptions.md
@@ -0,0 +1,71 @@
+## Exceptions
+
+Sometimes, when executing user code (i.e. contract or transaction code),
+the EVM halts exceptionally (i.e. outside of a STOP, a RETURN or a
+REVERT). When this happens, the CPU table invokes a special instruction
+with a dedicated operation flag `exception`. Exceptions can only happen
+in user mode; triggering an exception in kernel mode would make the
+proof unverifiable. No matter the exception, the handling is the same:
+
+-- The opcode which would trigger the exception is not executed. The
+operation flag set is `exception` instead of the opcode's flag.
+
+-- We push a value to the stack which contains: the current program
+counter (to retrieve the faulty opcode), and the current value of
+`gas_used`. The program counter is then set to the corresponding
+exception handler in the kernel (e.g. `exc_out_of_gas`).
+
+-- The exception handler verifies that the given exception would indeed
+be triggered by the faulty opcode. If this is not the case (if the
+exception has already happened or if it doesn't happen after executing
+the faulty opcode), then the kernel panics: there was an issue during
+witness generation.
+
+-- The kernel consumes the remaining gas and returns from the current
+context with `success` set to 0 to indicate an execution failure.
+
+Here is the list of the possible exceptions:
+
+1.  Raised when a native instruction (i.e. not a syscall) in user mode
+    pushes the amount of gas used over the current gas limit. When this
+    happens, the EVM jumps to `exc_out_of_gas`. The kernel then checks
+    that the consumed gas is currently below the gas limit, and that
+    adding the gas cost of the faulty instruction pushes it over it. If
+    the exception is not raised, the prover will panic when returning
+    from the execution: the remaining gas is checked to be positive
+    after STOP, RETURN or REVERT.
+
+2.  Raised when the read opcode is invalid. It means either that it
+    doesn't exist, or that it's a privileged instruction and thus not
+    available in user mode. When this happens, the EVM jumps to
+    `exc_invalid_opcode`. The kernel then checks that the given opcode
+    is indeed invalid. If the exception is not raised, decoding
+    constraints ensure no operation flag is set to 1, which would make
+    it a padding row. Halting constraints would then make the proof
+    unverifiable.
+
+3.  Raised when an instruction which pops from the stack is called when
+    the stack doesn't have enough elements. When this happens, the EVM
+    jumps to `exc_stack_overflow`. The kernel then checks that the
+    current stack length is smaller than the minimum stack length
+    required by the faulty opcode. If the exception is not raised, the
+    popping memory operation's address offset would underflow, and the
+    Memory range check would require the Memory trace to be too large
+    ($>2^{32}$).
+
+4.  Raised when the program counter jumps to an invalid location (i.e.
+    not a JUMPDEST). When this happens, the EVM jumps to
+    `exc_invalid_jump_destination`. The kernel then checks that the
+    opcode is a JUMP, and that the destination is not a JUMPDEST by
+    checking the JUMPDEST segment. If the exception is not raised,
+    jumping constraints will fail the proof.
+
+5.  Same as the above, for JUMPI.
+
+6.  Raised when a pushing instruction in user mode pushes the stack
+    over 1024. When this happens, the EVM jumps to `exc_stack_overflow`.
+    The kernel then checks that the current stack length is exactly
+    equal to 1024 (since an instruction can only push once at most), and
+    that the faulty instruction is pushing. If the exception is not
+    raised, stack constraints ensure that a stack length of 1025 in user
+    mode will fail the proof.
diff --git a/book/src/cpu_execution/gas_handling.md b/book/src/cpu_execution/gas_handling.md
@@ -0,0 +1,72 @@
+## Gas handling
+
+### Out of gas errors
+
+The CPU table has a "gas" register that keeps track of the gas used by
+the transaction so far.
+
+The crucial invariant in our out-of-gas checking method is that at any
+point in the program's execution, we have not used more gas than we have
+available; that is "gas" is at most the gas allocation for the
+transaction (which is stored separately by the kernel). We assume that
+the gas allocation will never be $2^{32}$ or more, so if "gas" does not
+fit in one limb, then we've run out of gas.
+
+When a native instruction (one that is not a syscall) is executed, a
+constraint ensures that the "gas" register is increased by the correct
+amount. This is not automatic for syscalls; the syscall handler itself
+must calculate and charge the appropriate amount.
+
+If everything goes smoothly and we have not run out of gas, "gas" should
+be no more than the gas allowance at the point that we STOP, REVERT,
+stack overflow, or whatever. Indeed, because we assume that the gas
+overflow handler is invoked *as soon as* we've run out of gas, all these
+termination methods verify that $\texttt{gas} \leq \texttt{allowance}$,
+and jump to `exc_out_of_gas` if this is not the case. This is also true
+for the out-of-gas handler, which checks that:
+
+1.  we have not yet run out of gas
+
+2.  we are about to run out of gas
+
+and "PANIC" if either of those statements does not hold.
+
+When we do run out of gas, however, this event must be handled. Syscalls
+are responsible for checking that their execution would not cause the
+transaction to run out of gas. If the syscall detects that it would need
+to charge more gas than available, it aborts the transaction (or the
+current code) by jumping to `fault_exception`. In fact,
+`fault_exception` is in charge of handling all exceptional halts in the
+kernel.
+
+Native instructions do this differently. If the prover notices that
+execution of the instruction would cause an out-of-gas error, it must
+jump to the appropriate handler instead of executing the instruction.
+(The handler contains special code that PANICs if the prover invoked it
+incorrectly.)
+
+### Overflow
+
+We must be careful to ensure that "gas" does not overflow to prevent
+denial of service attacks.
+
+Note that a syscall cannot be the instruction that causes an overflow.
+This is because every syscall is required to verify that its execution
+does not cause us to exceed the gas limit. Upon entry into a syscall, a
+constraint verifies that $\texttt{gas} < 2^{32}$. Some syscalls may have
+to be careful to ensure that the gas check is performed correctly (for
+example, that overflow modulo $2^{256}$ does not occur). So we can
+assume that upon entry and exit out of a syscall,
+$\texttt{gas} < 2^{32}$.
+
+Similarly, native instructions alone cannot cause wraparound. The most
+expensive instruction, JUMPI, costs 10 gas. Even if we were to execute
+$2^{32}$ consecutive JUMPI instructions, the maximum length of a trace,
+we are nowhere close to consuming $2^{64} - 2^{32} + 1$ (= Goldilocks
+prime) gas.
+
+The final scenario we must tackle is an expensive syscall followed by
+many expensive native instructions. Upon exit from a syscall,
+$\texttt{gas} < 2^{32}$. Again, even if that syscall is followed by
+$2^{32}$ native instructions of cost 10, we do not see wraparound modulo
+Goldilocks.
diff --git a/book/src/cpu_execution/intro.md b/book/src/cpu_execution/intro.md
@@ -0,0 +1,11 @@
+# CPU Execution {#cpulogic}
+
+The CPU is in charge of coordinating the different STARKs, proving the
+correct execution of the instructions it reads and guaranteeing that the
+final state of the EVM corresponds to the starting state after executing
+the input transactions. All design choices were made to make sure these
+properties can be adequately translated into constraints of degree at
+most 3 while minimizing the size of the different table traces (number
+of columns and number of rows).
+
+In this section, we will detail some of these choices.