first commit

PolyhedraZK · Oct 7, 2024 · 41d342f · 41d342f
commit 41d342f
Show file tree

Hide file tree

Showing 38 changed files with 28,114 additions and 0 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,45 @@
+name: Build and deploy
+
+on: [push, pull_request]
+
+jobs:
+  build:
+    name: Build Docusaurus
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 18
+
+      - name: Build
+        run: |
+          cd docs
+          npm ci
+          npm run build
+
+      - name: Upload Build Artifact
+        uses: actions/upload-pages-artifact@v3
+        with:
+          path: docs/build
+
+  deploy:
+    name: Deploy to GitHub Pages
+    needs: build
+    if: github.ref == 'refs/heads/main'
+
+    permissions:
+      pages: write
+      id-token: write
+
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+
+    runs-on: ubuntu-latest
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,20 @@
+# Dependencies
+/node_modules
+
+# Production
+/build
+
+# Generated files
+.docusaurus
+.cache-loader
+
+# Misc
+.DS_Store
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
diff --git a/README.md b/README.md
@@ -0,0 +1,41 @@
+# Website
+
+This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.
+
+### Installation
+
+```
+$ yarn
+```
+
+### Local Development
+
+```
+$ yarn start
+```
+
+This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
+
+### Build
+
+```
+$ yarn build
+```
+
+This command generates static content into the `build` directory and can be served using any static contents hosting service.
+
+### Deployment
+
+Using SSH:
+
+```
+$ USE_SSH=true yarn deploy
+```
+
+Not using SSH:
+
+```
+$ GIT_USER=<Your GitHub username> yarn deploy
+```
+
+If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
diff --git a/babel.config.js b/babel.config.js
@@ -0,0 +1,3 @@
+module.exports = {
+  presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
+};
diff --git a/docs/cuda/_category_.json b/docs/cuda/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "zkCuda",
+  "position": 5
+}
diff --git a/docs/cuda/cuda_like_frontend.md b/docs/cuda/cuda_like_frontend.md
@@ -0,0 +1,227 @@
+---
+sidebar_position: 1
+---
+
+# Develop, Optimize, and Deploy your Expander-Accelerated zkApp
+
+The Polyhedra's zkCUDA provides a development enviroment for creating high-performance GKR native circuits. With zkCUDA, you can leverage the power of Expander GKR prover and hardware's parallelism to accelerate your proof generation without losing the expressiveness of GKR circuits. The language should be familiar to CUDA users, with similar syntax and semantics. And it's written in Rust.
+
+With zkCUDA, you can:
+
+1. Easily develop high-performance GKR native circuits.
+2. Easily leverage the power of distributed hardware and highly parallel hardware like GPU or MPI-enabled clusters.
+
+## Programming Model
+
+Due to the nature of GKR circuits, the programming model is very similar to CUDA. We define following concepts:
+
+1. zk Streaming Multiprocessors (zkSMs): Each zkSM is a parallel instance of the same circuit, it can run up to 16 zk threads in parallel, one or more zkSMs can be executed in parallel.
+2. zk Threads: A zk thread is the smallest unit of execution, it's executed in a zkSM where the thread's execution is independent of other threads in the same zkSM.
+3. Device Memory: the device memory is the memory that can be directly accessed by zkSM.
+4. Host Memory: the host memory is the memory that can be directly accessed by the your own CPU.
+5. Kernel: a kernel is a function that describes the circuit computation.
+
+## Basic Concepts mapping from CUDA to Crytpography
+
+1. Memory Copy:
+
+    a. from host to device: this operation will copy data from host memory to device memory. It's done by using polynomial commitment, each memory copy will generate a commitment to the data array.
+
+    b. from device to host: this operation will copy data from device memory to host memory. It's done by opening the polynomial commitment (either single or batched) and output the polynomial evaluation to the host.
+
+2. zkSM: It's a concept that describes log-space uniformity of the circuit. Data-parallelism is achieved by running multiple instances of the same circuit in parallel.
+
+3. Kernel function: it will be a circuit described by a variant of gnark frontend in Rust.
+
+4. Lookup tables: a lookup table is a powerful tool to optimize the circuit, it works like a global random access memory in CUDA. It's slow but can achieve random access to any memory address. We will provide a built in dict type for users to use it flexbilely. You need to fill the dict with data in host, then perform the host to device memory copy to send the data to device.
+
+## Programming Guides [WIP]
+
+It's still working in progress, not usable now, and the APIs may change in the future. Please raise issue in github if you have any questions or suggestions.
+
+## 1. Kernel Function Definition
+
+The kernel function in this CUDA-like circuit frontend is similar to a CUDA kernel. It aligns with the current type of memorize call functions. This function will run under Zero-Knowledge (ZK) conditions, with the proof automatically maintained by contexts.
+
+```rust
+fn example_kernel<C: Config>(api: &mut API<C>, inputs: &[&[Variable]]) -> Vec<Variable> {
+    vec![api.mul(inputs[0][0], inputs[0][1])]
+}
+```
+
+Functions with more complex parameters might look like the following:
+
+```rust
+#[kernel]
+fn complex_kernel<C: Config>(
+    api: &mut API<C>,
+    inputs: &[&[Variable]]
+) -> Vec<Variable> {
+    // Implementation
+}
+```
+
+## 2. Context
+
+The context automatically maintains the existing proof and commits the input variables. It provides a series of functions as follows:
+
+```rust
+fn init_ctx<C: Config>() -> Context<C> {
+    // Implementation
+}
+
+impl<C: Config> Context<C> {
+    fn copy_from_host<T: IntoFlattenedFieldAndShape<C>>(&mut self, vars: T) -> DeviceMemory<C> {
+        // Implementation
+    }
+
+    fn copy_to_host<T: FromFlattenedFieldAndShape<C>>(&self, dev_mem: &DeviceMemory<C>) -> T {
+        // Implementation
+    }
+
+    fn call_kernel<F>(&mut self, f: F, inputs: &[DeviceMemory<C>]) -> Result<DeviceMemory<C>, KernelError>
+    where
+        F: Fn(&mut API<C>, &[&[Variable]]) -> Vec<Variable>,
+    {
+        // Implementation
+    }
+
+    fn get_proof(&mut self) -> Proof {
+        // Implementation
+    }
+}
+```
+
+## 3. DeviceMemory
+
+The `DeviceMemory<C>` struct represents memory on the device (in this case, the ZK circuit). It has the following internal structure:
+
+```rust
+struct DeviceMemory<C: Config> {
+    values: Vec<C::CircuitField>,
+    shape: Vec<usize>,
+    // Additional internal fields for device-specific usage
+}
+
+impl<C: Config> DeviceMemory<C> {
+    fn reshape(&mut self, new_shape: Vec<usize>) -> Result<(), ReshapeError> {
+        // Implementation
+    }
+
+    fn flatten(&mut self) {
+        // Implementation
+    }
+
+    fn dim(&self) -> &[usize] {
+        &self.shape
+    }
+
+    fn parallelize(&self) -> Vec<DeviceMemory<C>> {
+        // Split DeviceMemory based on the first dimension
+        // Return a Vec<DeviceMemory<C>>, where each DeviceMemory represents a parallel instance
+    }
+}
+```
+
+Note that `DeviceMemory` does not expose a public `new` method, as instances should only be created through the `Context`.
+
+## 4. IntoFlattenedFieldAndShape and FromFlattenedFieldAndShape Traits
+
+To support copying nested arrays and vectors of arbitrary dimensions to and from the device, we define the following traits:
+
+```rust
+trait IntoFlattenedFieldAndShape<C: Config> {
+    fn into_flattened_field_and_shape(self) -> (Vec<C::CircuitField>, Vec<usize>);
+}
+
+trait FromFlattenedFieldAndShape<C: Config>: Sized {
+    fn from_flattened_field_and_shape(values: Vec<C::CircuitField>, shape: Vec<usize>) -> Self;
+}
+```
+
+These traits can be implemented for various nested structures of `Vec` and arrays.
+
+## 5. Kernel API (ExpanderCompilerCollection)
+
+The Kernel API, also known as ExpanderCompilerCollection (ECC), provides a builder API similar to gnark. It includes the following operations:
+
+```rust
+pub trait BasicAPI<C: Config> {
+    fn add(&mut self, x: impl ToVariableOrValue<C::CircuitField>, y: impl ToVariableOrValue<C::CircuitField>) -> Variable;
+    fn sub(&mut self, x: impl ToVariableOrValue<C::CircuitField>, y: impl ToVariableOrValue<C::CircuitField>) -> Variable;
+    fn mul(&mut self, x: impl ToVariableOrValue<C::CircuitField>, y: impl ToVariableOrValue<C::CircuitField>) -> Variable;
+    fn div(&mut self, x: impl ToVariableOrValue<C::CircuitField>, y: impl ToVariableOrValue<C::CircuitField>, checked: bool) -> Variable;
+    fn neg(&mut self, x: impl ToVariableOrValue<C::CircuitField>) -> Variable;
+    fn inverse(&mut self, x: impl ToVariableOrValue<C::CircuitField>) -> Variable;
+    fn is_zero(&mut self, x: impl ToVariableOrValue<C::CircuitField>) -> Variable;
+    fn xor(&mut self, x: impl ToVariableOrValue<C::CircuitField>, y: impl ToVariableOrValue<C::CircuitField>) -> Variable;
+    fn or(&mut self, x: impl ToVariableOrValue<C::CircuitField>, y: impl ToVariableOrValue<C::CircuitField>) -> Variable;
+    fn and(&mut self, x: impl ToVariableOrValue<C::CircuitField>, y: impl ToVariableOrValue<C::CircuitField>) -> Variable;
+    fn assert_is_zero(&mut self, x: impl ToVariableOrValue<C::CircuitField>);
+    fn assert_is_non_zero(&mut self, x: impl ToVariableOrValue<C::CircuitField>);
+    fn assert_is_bool(&mut self, x: impl ToVariableOrValue<C::CircuitField>);
+    fn assert_is_equal(&mut self, x: impl ToVariableOrValue<C::CircuitField>, y: impl ToVariableOrValue<C::CircuitField>);
+    fn assert_is_different(&mut self, x: impl ToVariableOrValue<C::CircuitField>, y: impl ToVariableOrValue<C::CircuitField>);
+}
+```
+
+## 6. Kernel Execution
+
+The `call_kernel` method handles the parallelization of kernel execution:
+
+```rust
+fn call_kernel<F>(&mut self, f: F, inputs: &[DeviceMemory<C>]) -> Result<DeviceMemory<C>, KernelError>
+where
+    F: Fn(&mut API<C>, &[&[Variable]]) -> Vec<Variable>,
+{
+    // Check if the first dimension is consistent across all inputs
+    let parallel_count = inputs[0].shape[0];
+    if !inputs.iter().all(|dm| dm.shape[0] == parallel_count) {
+        return Err(KernelError::InconsistentParallelCount);
+    }
+
+    // Parallelize all inputs
+    let parallelized_inputs: Vec<Vec<DeviceMemory<C>>> = inputs
+        .iter()
+        .map(|dm| dm.parallelize())
+        .collect();
+
+    // Execute parallel kernel calls
+    let mut results = Vec::with_capacity(parallel_count);
+    for i in 0..parallel_count {
+        // Implementation
+    }
+
+    // Merge results
+    let merged_results = self.merge_results(results);
+    Ok(merged_results)
+}
+```
+
+## 7. Example Usage
+
+Here's an example of how to use this CUDA-like circuit frontend:
+
+```rust
+fn kernel_func<C: Config>(api: &mut API<C>, inputs: &[&[Variable]]) -> Vec<Variable> {
+    let a = inputs[0];
+    let b = inputs[1];
+    let sum = api.add(a[0], a[1]);
+    vec![sum]
+}
+
+fn main() {
+    let mut ctx = init_ctx();
+    // Create 3 parallel instances, each with 2 elements
+    let a = ctx.copy_from_host(vec![vec![1u32, 2u32], vec![3u32, 4u32], vec![5u32, 6u32]]);
+    // Create 3 parallel instances, each with 1 element
+    let b = ctx.copy_from_host(vec![3u32, 7u32, 11u32]);
+
+    let result = ctx.call_kernel(kernel_func, &[a, b]).unwrap();
+
+    // result's shape will be [3, 1], representing 3 parallel instances, each outputting 1 result
+    assert_eq!(result.shape, vec![3, 1]);
+
+    let proof = ctx.get_proof();
+}
+```
diff --git a/docs/faq/_category_.json b/docs/faq/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "Troubleshooting",
+  "position": 7
+}
diff --git a/docs/faq/avx512.md b/docs/faq/avx512.md
@@ -0,0 +1,23 @@
+---
+sidebar_position: 1
+---
+
+# AVX2 and AVX512
+
+Our compiler has an embedded Expander, which uses AVX2 by default to handle the critical proving steps.
+
+Expander supports AVX512, and if you need the additional performance provided by AVX512, in most cases, you can compile a version of Expander using AVX512 and then invoke it via the command line.
+
+If you need to use AVX512 within the compiler, you need to provide the following parameters to `rustc`:
+
+```
+-C target-cpu=native -C target-features=+avx512f
+```
+
+For example, you can test the crate using:
+
+```
+RUSTFLAGS="-C target-cpu=native -C target-feature=+avx512f" cargo test
+```
+
+If you want to use AVX512 for proving in Go, there are a few more steps involved. You need to clone this repo and then run `build-rust-avx512.sh` to compile the AVX512 library. After that, you need to use this local repo in your Go code.
diff --git a/docs/go/_category_.json b/docs/go/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "Go Walkthrough",
+  "position": 2
+}