Skip to content
Peizhao Hu edited this page Oct 6, 2020 · 76 revisions

Note, you can also join us on Slack.

Installation

See ChangeLog for recent changes.

We assume your have Docker engine installed. Run the following commands or replace "latest" with other versions in https://hub.docker.com/r/sparkfhe/sparkfhe-standalone/tags

docker pull sparkfhe/sparkfhe-standalone:latest
docker run -it sparkfhe/sparkfhe-standalone:latest

Option 2: Run demo in Kubernetes

Refer to our SparkFHE-Addon repository for more information.

Option 3: Direct installation on your system

Prerequisites:

  • wget (e.g. Mac OSX: Homebrew, brew install wget)

Run the following commands to obtain a distribution version of the SparkFHE project. Once finished, you should see a folder like "spark-3.1.0-SNAPSHOT-bin-SparkFHE". We recommend you install the SparkFHE distribution in your home directory.

cd ~
wget https://sparkfhe.s3.amazonaws.com/TestDrive.bash
bash TestDrive.bash all

Next, run the following commands to install all shared libraries as dependencies, such as GoogleTest, GMP, NTL, HELib, SEAL, etc. Note, the execution of the bash file, "install_shared_libraries.bash", will take a while.

cd ~/spark-3.1.0-SNAPSHOT-bin-SparkFHE/SparkFHE-Addon/scripts/setup
bash install_shared_libraries.bash

Note, if you have access to our C++ shared library, you can also softlink the "libSparkFHE" folder here and avoid this step. Also, if you are a developer of our C++ shared library, please softlink hadoop to /user/local/. Our CMake configuration file will look for libhdfs.so and include files from this directory.

sudo ln -s ~/spark-3.1.0-SNAPSHOT-bin-SparkFHE/hadoop /usr/local/hadoop

Option 4: Create a cluster on Cloudlab.us

  • You need to create an account on cloudlab.us and join my iotx project. Once, you have created an account and request to join my project, I will review and approve the request. Note, you need to add your public key to the cloudlab account because we will use password-less access to all nodes.
  • Look for the project profile SparkFHE-Mesos-HDFS and instantiate a new experiment using this profile.
  • Once your experiment has started, you can follow this instructions to initialize your cluster.
  • ssh into the master node and run the following script to upload crypto parameters to the Hadoop distributed filesystem.
cd /spark-3.1.0-SNAPSHOT-bin-SparkFHE/SparkFHE-Addon/scripts/spark-submit/cluster
sudo bash uploadCryptoParameters.bash

Run Demo code

If you have followed the Installation instructions shown above successfully, you are ready to give our demo code a run. There are two ways of doing this, running the demo code locally or in a clustered environment. If a job is split up into many slices, it will be running using multi-threading (on local machine) or distributed systems (in cluster environment).

Run Demo in Local Environment

Running the demo locally is quite straight forward, just type in the following commands.

cd /spark-3.1.0-SNAPSHOT-bin-SparkFHE/SparkFHE-Addon/scripts/spark-submit/local
bash mySparkSubmit-HELIB-BGV-Batching.bash

In addition, you can try other libraries and schemes:

mySparkSubmit-HELIB-BGV-Batching.bash     mySparkSubmit-HELIB-CKKS-Batching.bash  mySparkSubmit-SEAL-BFV-Nonbatching.bash
mySparkSubmit-HELIB-BGV-Nonbatching.bash  mySparkSubmit-SEAL-BFV-Batching.bash    mySparkSubmit-SEAL-CKKS-Batching.bash

This demo will perform the following operations:

  • Basic arithmetic operations over plaintexts (testing whether Spark is able to use our library)
  • Key generation (produce a key pair in ~/spark-3.1.0-SNAPSHOT-bin-SparkFHE/gen/keys/)
  • Encrypt and decrypt (produce encryption of 0 and 1, and two vectors of encrypted numbers; generated ciphertexts are in ~/spark-3.1.0-SNAPSHOT-bin-SparkFHE/gen/records/)
  • Basic arithmetic operations over ciphertexts (1+0, 1*0, 1-0)
  • Compute dot-product (or inner-product) over the two vectors of encrypted numbers

Run Demo in Cluster Environment

If you would like to explore the option of running the above test examples in a cluster environment, additional steps are required. Assumed that you have a cluster running on the cloudlab.us.

cd /spark-3.1.0-SNAPSHOT-bin-SparkFHE/SparkFHE-Addon/scripts/spark-submit/cluster
bash mySparkSubmitMesosCluster.bash [MesosMasterFullURLName]

You can visit http://[MesosMasterFullURLName]:5050 to see the submitted Spark jobs and http://[MesosMasterFullURLName]:9780 to see the keys and ciphertexts generated from the example runs on the Hadoop distributed filesystem.

Test Outputs

By default, the demo output (see Our Example Run) will contain both the SparkFHE output as well as the Spark output. Some may find it hard to see the SparkFHE output "sandwiched" between the Spark outputs. If this is the case for you, feel free to hide the Spark output using "2 >/dev/null" to redirect them to the null device. Getting back to the SparkFHE output, you should expect the following examples:

Basic FHE arithmetic over two encrypted numbers

Test case(s):

enc(1) + enc(0)
enc(1) * enc(0)
enc(1) - enc(0)

SparkFHE outputs:

Homomorphic Addition:1
Homomorphic Multiplication:0
Homomorphic Subtraction:1

Dot-Product of two vectors of encrypted numbers

Test case(s):

vec_a: (0,1,2,3,4) 
vec_b: (4,3,2,1,0)

SparkFHE outputs:

Dot product: 10

Upgrade the installed distribution

If you want to upgrade your SparkFHE distribution, run this command.

bash TestDrive.bash selfupdate

Note, if you have an older version of TestDrive.bash, you need to download a new one.

wget https://sparkfhe.s3.amazonaws.com/TestDrive.bash

Then, run the corresponding commands:

Upgrade SparkFHE-API, SparkFHE-Examples, and SparkFHE-Plugin

bash TestDrive.bash dependencies

Upgrade libSparkFHE shared library

bash TestDrive.bash lib

Upgrade SparkFHE-Addon

This can be done by git commands.

cd SparkFHE-Addon
git pull

Thank you for taking an interested in the SparkFHE project.

You may check the Errors&Fixes page if you run into any problem.

Feel free to contribute and leave a comment if you have any questions or suggestions, or join us on Slack.