Table of Contents
Implementation of a privacy centric system for cross publisher, cross media ads measurement through secure multiparty computations.
At a high level the system requires at least three independent deployments, one controller and two secure multiparty computation nodes, each operating its own microservices and storage instances. In order to make precise statements about the system architecture we introduce the following terms.
The Kingdom is a single deployment that allows advertisers to configure reports, requests the data and computations required to generate those reports, and makes the completed reports accessible to advertisers.
The Duchies are at least two separate deployments each operated by an independent organization. The Duchies store the encrypted data and perform the computations required to generate the reports. Each Duchy holds part of the private key required to decrypt the data. Therefore all Duchies must participate in order to perform a computation and decrypt the result. For additional security the Duchies should be multi-cloud with at least one Duchy deploying to a different cloud provider than the others.
The system operates as follows. For more detail see the references linked in the Documentation section.
- Advertisers configure reports that may span a variety of campaigns, publishers, and forms of media.
- The Kingdom determines which data are required to generate these reports and compiles a list of requisitions for the various publishers. The Kingdom tracks which of the requisitions are open and which are fulfilled.
- Publishers invoke an API on the Duchy to obtain a list of open requisitions. The Duchy proxies to the Kingdom to retrieve the list. The requisitions specify which data are required from the publisher in order to generate the reports.
- To fulfill the requisitions publishers compute sketches similar to those used in the HyperLogLog algorithm for cardinality and frequency estimation. In practice we do not use HyperLogLog itself due to issues preserving user privacy that are beyond the scope of this discussion. Publishers encrypt these sketches using the combined public key of all the Duchies. Publishers send the encrypted sketches to a Duchy which stores them and informs the Kingdom that the requisition for that data is fulfilled. The encrypted sketches required for a particular report may be distributed across multiple Duchies.
- The Kingdom determines which pending computations have all necessary requisitions fulfilled and are therefore ready to run. The Duchies poll the Kingdom at regular intervals to claim this work. Each computation has a Primary Duchy assigned and a deterministic order of computation.
- For each computation all Duchies fetch the required encrypted sketches, interleave noise into them, and send them to the Primary Duchy. The Primary Duchy writes these sketches to its storage instance as they arrive.
- Once the Primary Duchy receives all required encrypted noised sketches it combines them. Computation then follows the predetermined order making two rounds through the Duchies. Each round ends with the Primary Duchy.
- During the first round each Duchy shuffles the sketches to destroy information that could be reconstructed from knowing the register indices.
- During the second round the Duchies each use their piece of the private key to decrypt the results. The Primary Duchy sends the final results back to the Kingdom.
.
├── build ## Stuff specific to the build system
├── imports ## Build aliases for external dependencies
│ ├── java
│ └── kotlin
├── src ## All source code
│ ├── main ## Source code for production deployments
│ │ ├── cc ## Crypto library
│ │ ├── kotlin ## Business Logic, Services, Daemons, DB accessors
│ │ └── proto ## Service and config definitions
│ └── test ## Source code for testing code in //src/main/
└── tools
Source code packages are grouped by language under the //src
directory, where
//src/main
is the code for a production deployment and //src/test
is code
for unit tests and integration tests of the code under //src/main
.
The majority of the code is written in Kotlin in the org.wfanet.measurement
package. The largest exception is the code to do cryptographic operations, which
is written in C++ with a Java JNI wrapper.
The //imports
directory contains Bazel build aliases for external
dependencies. No source code, third party or otherwise, is contained in the
imports directory. The alias for a dependency will be in a directory like its
package name. Roughly speaking, the directory structure of the import mirrors
the directory structure of the imported package.
For example, the alias for Java Protobuf
(@com_google_protobuf//:protobuf_java
) is in
//imports/java/com/google/protobuf/BUILD.bazel
because the Java package is
com.google.protobuf
.
The docker
and k8s
directories contain schemas and configuration.
Servers and daemons are both long running jobs deployed to Kubernetes. A key difference is whether or not they accept RPC communication from other binaries. In short, a server is a gRPC endpoint, where a daemon is not.
- Services are defined in proto3 in the
//src/main/proto
directory. - Public APIs are defined in
wfa-measurement-proto
, another GitHub repository. TODO: Add link to other GitHub repository.
//src/main/kotlin/org/wfanet/measurement/
├── duchy ## Duchy specific code
│ ├── daemon ## Base implementations of daemons run in a Duchy
│ ├── service ## Base implementations of services run by a Duchy
│ │ └── system ## Services used within the measurement system
│ └── deploy ## Deployable artifacts
│ ├── common ## Servers and daemons that deploy on any cloud
│ ├── gcloud ## Artifacts specific to Google Cloud deployment
│ └── testing ## Servers and daemons for testing
└── kingdom ## Kingdom specific code
├── daemon ## Base implementations of daemons run by the Kingdom
├── service ## Base implementations of services run by the Kingdom
│ ├── api ## Public facing services
│ ├── internal ## Services used within the Kingdom
│ └── system ## Services used within the measurement system
└── deploy ## Deployable artifacts
├── common ## Servers and daemons that deploy on any cloud
└── gcloud ## Artifacts specific to Google Cloud deployment
Throughout the code there are directories named common
. These contain code
which is common to multiple packages under the same parent directory. As an
example, //foo/common
contains code that may be used by other packages under
//foo
, so //foo/bar
and //foo/baz
. The code is not common to packages
under a different parent, i.e. //foo/common
should not be used by //bar
.
In this repository packages under //src/test
do not depend on other packages
in //src/test
. As a result test infrastructure code used to test multiple
packages is in a test-only package in //src/main
. One benefit of such a
structure is that the test infrastructure can also be tested the same way as
production code.
There is a lot that goes into the design of the system. Too much to cover in this README alone. More details are covered in:
- A System Design for Privacy-Preserving Reach and Frequency Estimation
- Privacy-Preserving Secure Cardinality and Frequency Estimation
This project makes wide use of the data protection technique known as differential privacy. So we have also included some useful introductions to said topic.
- Differential Privacy from the Ground Up
- Introduction to Differential Privacy (Slides)
- Differential Privacy and Randomized Response
See CONTRIBUTING.md