Skip to content

Running Shark on EC2

rxin edited this page Jul 7, 2012 · 41 revisions

This guide describes how to get Shark running on an EC2 cluster. It assumes you have already signed up for Amazon EC2 account on the Amazon Web Services site.

Note: We will release an AMI that automates most the following steps in the next week or two.

Launch a Mesos cluster using the EC2 deploy scripts in your local Mesos directory (/path/to/mesos/ec2). See https://github.com/mesos/mesos/wiki/EC2-Scripts for details on starting and stopping an ec2 cluster these scripts. In general:

$ ./mesos-ec2 -k <keypair-name> -i <key-file> -s <num-slaves> launch <cluster-name>

Where <keypair> is the name of your EC2 key pair (that you gave it when you created it), <key-file> is the private key file for your key pair, <num-slaves> is the number of slave nodes to launch (try 1 at first), and <cluster-name> is the name to give to your cluster.

Login to the mesos master using the ec2 scripts in mesos/ec2:

$ ./mesos-ec2 -k key -i key.pem login <cluster-name>

Download Ant 1.8.2 from the Apache Ant site. Unzip the file, set $ANT_HOME (in .bash_profile) to the unzipped location, and add $ANT_HOME/bin to $PATH.

Download and setup Hive:

$ export HIVE_DEV_HOME=/path/to/hive
$ git clone git://github.com/amplab/hive.git -b shark-0.7.0 $HIVE_DEV_HOME
$ cd $HIVE_DEV_HOME
$ ant package

Update Spark:

$ cd spark
$ git pull
$ sbt/sbt publish-local

Download Shark from Github, edit configuration file, and build Shark. Make sure $HIVE_HOME is set to $HIVE_DEV_HOME/build/dist and $HADOOP_HOME to your HDFS directory in shark-env.sh or as environment variables:

$ git clone git://github.com/amplab/shark
$ cd shark
$ vim conf/shark-env.sh
$ sbt/sbt products

Copy all modified directories to slave nodes using:

$ /root/mesos-ec2/copy-dir /root/hive
$ /root/mesos-ec2/copy-dir /root/shark

Any time a configuration file is modified it must be copied to slave nodes again.