-
Notifications
You must be signed in to change notification settings - Fork 327
Running Shark on EC2
This guide describes how to get Shark running on an EC2 cluster. It assumes you have already signed up for Amazon EC2 account on the Amazon Web Services site.
Note: We will release an AMI that automates most the following steps in the next week or two.
Launch a Mesos cluster using the EC2 deploy scripts in your local Mesos directory (/path/to/mesos/ec2). See https://github.com/mesos/mesos/wiki/EC2-Scripts for details on starting and stopping an ec2 cluster these scripts. In general:
$ ./mesos-ec2 -k <keypair-name> -i <key-file> -s <num-slaves> launch <cluster-name>
Where <keypair>
is the name of your EC2 key pair (that you gave it when you created it), <key-file>
is the private key file for your key pair, <num-slaves>
is the number of slave nodes to launch (try 1 at first), and <cluster-name>
is the name to give to your cluster.
Login to the mesos master using the ec2 scripts in mesos/ec2:
$ ./mesos-ec2 -k key -i key.pem login <cluster-name>
Download Ant 1.8.2 from the Apache Ant site. Unzip the file, set $ANT_HOME
(in .bash_profile) to the unzipped location, and add $ANT_HOME/bin
to $PATH
.
Download and setup Hive:
$ export HIVE_DEV_HOME=/path/to/hive
$ git clone git://github.com/amplab/hive.git -b shark-0.7.0 $HIVE_DEV_HOME
$ cd $HIVE_DEV_HOME
$ ant package
Update Spark:
$ cd spark
$ git pull
$ sbt/sbt publish-local
Download Shark from Github, edit configuration file, and build Shark. Make sure $HIVE_HOME
is set to $HIVE_DEV_HOME/build/dist
and $HADOOP_HOME
to your HDFS directory in shark-env.sh
or as environment variables:
$ git clone git://github.com/amplab/shark
$ cd shark
$ vim conf/shark-env.sh
$ sbt/sbt products
Copy all modified directories to slave nodes using:
$ /root/mesos-ec2/copy-dir /root/hive
$ /root/mesos-ec2/copy-dir /root/shark
Any time a configuration file is modified it must be copied to slave nodes again.