Setting up remote Spark job execution

To enable running Spark jobs from the Curation Interface (e.g. for restoring versions or committing changes), do the following:

Create an SSH key pair for the user that is running the Curation Interface API server
Copy the created PUBLIC key (~/.ssh/id_rsa.pub) into the cluster's ~/.ssh/authorized_keys file (for the user that will run the Spark jobs)
Try logging in via SSH (e.g. ssh <user>@<cluster host>), see if it works without a password (if it doesn't, configure your SSH server for public key authentication or modify the access rights for the .ssh folder)
Customize the run_job.sh file in the Curation repository with your user- and hostname
On the cluster node, add a build of the Ingestion pipeline containing the jobs that need to be run from Curation as ~/jars/curation_jobs.jar and add the spark.sh script from the Ingestion repository as ~/scripts/spark.sh
Modify the paths in the run_job.sh script for different paths if necessary (e.g. different user name)
Test the job execution by navigating to http://<curation host>:3000/api/run/versiondiff/667ccd90-5cc4-11e7-9047-dfcf226f2431,aa8ac8e0-5ca9-11e7-aea9-c37dbfcb3b83

Provide feedback