Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Spark to the cluster #18

Open
wants to merge 41 commits into
base: 2.7
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
c737285
add Spark to the cluter config
gregbaker Jul 11, 2015
2b11ac2
configuration to allow runing spark in standalone mode
gregbaker Jul 21, 2015
688da4c
clean up README URLs
gregbaker Jul 22, 2015
96050fb
basics of hive install
gregbaker Jul 28, 2015
ad76e72
upgrade openjdk from 6 to 7 (for hive)
gregbaker Jul 28, 2015
fe6f1bc
setup script for Hive
gregbaker Jul 28, 2015
f03dd77
decrease HDFS default replication so files not available on all nodes
gregbaker Jul 28, 2015
314dce4
update tools to current versions: Hadoop 2.7.1 et al
gregbaker Aug 18, 2015
bf6cc14
decrease HDFS default replication so files not available on all nodes
gregbaker Jul 28, 2015
8e8fab3
update tools to current versions: Hadoop 2.7.1 et al
gregbaker Aug 18, 2015
9f649bd
update README for version changes
gregbaker Aug 18, 2015
1c4c764
Merge branch '2.7' into hive
gregbaker Aug 18, 2015
383f700
fix hbase URL in readme
gregbaker Aug 21, 2015
c01a2eb
change hbase dfs replication to 2 (like the HDFS default, not all dat…
gregbaker Aug 21, 2015
ec6fdac
change hbase dfs replication to 2 (like the HDFS default, not all dat…
gregbaker Aug 21, 2015
65f6c4e
fix hbase URL in readme
gregbaker Aug 21, 2015
dfdc3eb
configure HBase so it works on single-node; enable by default
gregbaker Aug 21, 2015
601cbeb
configure HBase so it works on single-node; enable by default
gregbaker Aug 21, 2015
e9e5907
Add Phoenix manifst (but disable by default)
gregbaker Aug 21, 2015
abb943d
undo local changes from my Vagrantfiles
gregbaker Aug 21, 2015
81a2e78
undo local changes from my Vagrantfiles
gregbaker Aug 21, 2015
74fea08
fix filename for hbase regionfile
gregbaker Aug 22, 2015
c3cd707
fix filename for hbase regionfile
gregbaker Aug 22, 2015
dce709c
basics of hive install
gregbaker Jul 28, 2015
715c4e3
upgrade openjdk from 6 to 7 (for hive)
gregbaker Jul 28, 2015
2b09a1e
setup script for Hive
gregbaker Jul 28, 2015
ecb4e35
Add Phoenix manifst (but disable by default)
gregbaker Aug 21, 2015
5bdcded
add module to install Phoenix
gregbaker Sep 21, 2015
0388fdf
Merge branch 'hive' of github.com:gregbaker/vagrant-cascading-hadoop-…
gregbaker Sep 21, 2015
0df66bd
bump Spark version
gregbaker Sep 23, 2015
8536e06
update mapred staging directories so permissions work out
gregbaker Sep 23, 2015
02137b3
Fix a bug in vagrant file causing multiple initializations of puppet
AlekseiS Nov 28, 2015
4d5aa61
update versions to current
gregbaker Jul 31, 2016
16f67d4
un-fix mapred permissions, which break newer version
gregbaker Aug 18, 2016
c32fc9d
Merge branch 'fix_vagrant' of https://github.com/AlekseiS/vagrant-cas…
gregbaker Aug 18, 2016
6eed775
Merge pull request #1 from AlekseiS/fix_vagrant
gregbaker Aug 18, 2016
c65b99e
Merge branch '2.7' of github.com:gregbaker/vagrant-cascading-hadoop-c…
gregbaker Aug 18, 2016
232991e
fix puppet call in single-code vagrantfile
gregbaker Aug 18, 2016
cec58e0
upgrade to Ubuntu Vivid
gregbaker Aug 19, 2016
edde5fc
update erb syntax to silence warnings
gregbaker Aug 19, 2016
55df60f
fix puppet dependencies
gregbaker Aug 19, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
*~
*.sw[a-z]
.vagrant
employees.tgz
hadoop-*.tar.gz
hadoop-*.tar.gz.mds
hbase-*.tar.gz
spark-*.tgz
apache-hive-*.tar.gz
phoenix-*.tar.gz
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ This will set up 4 machines - `master`, `hadoop1`, `hadoop2` and `hadoop3`. Each
RAM. If this is too much for your machine, adjust the `Vagrantfile`.

The machines will be provisioned using [Puppet](http://puppetlabs.com/). All of them will have hadoop
(apache-hadoop-2.6.0) installed, ssh will be configured and local name resolution also works.
(apache-hadoop-2.7.1) installed, ssh will be configured and local name resolution also works.

Hadoop is installed in `/opt/hadoop-2.6.0` and all tools are in the `PATH`.
Hadoop is installed in `/opt/hadoop-2.7.1` and all tools are in the `PATH`.

The `master` machine acts as the namenode and the yarn resource manager, the 3 others are data nodes and run node
managers.
Expand All @@ -58,7 +58,7 @@ is required.

### Starting the cluster

This cluster uses the `ssh-into-all-the-boxes-and-start-things-up`-approach, which is fine for testing.
This cluster uses the `ssh`-into-all-the-boxes-and-start-things-up-approach, which is fine for testing.

Once all machines are up and provisioned, the cluster can be started. Log into the master, format hdfs and start the
cluster.
Expand Down Expand Up @@ -102,9 +102,9 @@ start up a new cluster.

You can access all services of the cluster with your web-browser.

* namenode: http://master.local:50070/dfshealth.jsp
* application master: http://master.local:8088/cluster
* job history server: http://master.local:19888/jobhistory
* namenode: http://master.local:50070/
* application master: http://master.local:8088/
* job history server: http://master.local:19888/

### Command line

Expand All @@ -131,7 +131,7 @@ the `PATH`. The SDK itself can be found in `/opt/CascadingSDK`.

### Driven

The SDK allows you to install the [Driven plugin for Cascading]((http://cascading.io/driven) , by simply running
The SDK allows you to install the [Driven plugin for Cascading](http://cascading.io/driven) , by simply running
`install-driven-plugin`. This will install the plugin for the vagrant user in `/home/vagrant/.cascading/.driven-plugin`.

Installing the plugin will cause every Cascading based application to send telemetry to `https://driven.cascading.io`.
Expand All @@ -157,7 +157,7 @@ The setup is fully distributed. `hadoop1`, `hadoop2` and `hadoop3` are running a
[zookeeper](http://zookeeper.apache.org) instance and a region-server each. The HBase master is running on the `master`
VM.

The webinterface of the HBase master is http://master.local:60010.
The webinterface of the HBase master is http://master.local:16010.

## Hacking & Troubleshooting & Tips & Tricks

Expand Down
15 changes: 7 additions & 8 deletions Vagrantfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,23 @@
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "cascading-hadoop-base"
config.vm.box_url = "http://files.vagrantup.com/precise64.box"
config.vm.box = "larryli/vivid64"

config.vm.provider :virtualbox do |vb|
vb.customize ["modifyvm", :id, "--cpus", "1", "--memory", "512"]
vb.customize ["modifyvm", :id, "--cpus", "1", "--memory", "1024"]
end

config.vm.provider "vmware_fusion" do |v, override|
override.vm.box_url = "http://files.vagrantup.com/precise64_vmware.box"
v.vmx["memsize"] = "512"
v.vmx["memsize"] = "1024"
v.vmx["numvcpus"] = "1"
end

config.vm.define :hadoop1 do |hadoop1|
hadoop1.vm.network "private_network", ip: "192.168.7.12"
hadoop1.vm.hostname = "hadoop1.local"

config.vm.provision :puppet do |puppet|
hadoop1.vm.provision :puppet do |puppet|
puppet.manifest_file = "datanode.pp"
puppet.module_path = "modules"
end
Expand All @@ -31,7 +30,7 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
hadoop2.vm.network "private_network", ip: "192.168.7.13"
hadoop2.vm.hostname = "hadoop2.local"

config.vm.provision :puppet do |puppet|
hadoop2.vm.provision :puppet do |puppet|
puppet.manifest_file = "datanode.pp"
puppet.module_path = "modules"
end
Expand All @@ -41,7 +40,7 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
hadoop3.vm.network "private_network", ip: "192.168.7.14"
hadoop3.vm.hostname = "hadoop3.local"

config.vm.provision :puppet do |puppet|
hadoop3.vm.provision :puppet do |puppet|
puppet.manifest_file = "datanode.pp"
puppet.module_path = "modules"
end
Expand All @@ -51,7 +50,7 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
master.vm.network "private_network", ip: "192.168.7.10"
master.vm.hostname = "master.local"

config.vm.provision :puppet do |puppet|
master.vm.provision :puppet do |puppet|
puppet.manifest_file = "master.pp"
puppet.module_path = "modules"
end
Expand Down
2 changes: 2 additions & 0 deletions manifests/datanode.pp
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
include base
include hadoop
include hbase
include phoenix
include spark
include avahi
9 changes: 8 additions & 1 deletion manifests/master-single.pp
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,14 @@
slaves_file => "puppet:///modules/hadoop/slaves-single",
hdfs_site_file => "puppet:///modules/hadoop/hdfs-site-single.xml"
}
class{ 'hbase':
regionservers_file => "puppet:///modules/hbase/regionservers-single",
hbase_site_file => "puppet:///modules/hbase/hbase-site-single.xml"
}

#include hbase
include hbase
#include hive
include phoenix
include spark
include avahi
include cascading
3 changes: 3 additions & 0 deletions manifests/master.pp
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
include base
include hadoop
include hbase
#include hive
include phoenix
include spark
include avahi
include cascading
2 changes: 1 addition & 1 deletion modules/base/manifests/init.pp
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
command => '/usr/bin/apt-get update',
}

package { "openjdk-6-jdk" :
package { "openjdk-7-jdk" :
ensure => present,
require => Exec['apt-get update']
}
Expand Down
2 changes: 1 addition & 1 deletion modules/cascading/files/ccsdk.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export CASCADING_SDK_HOME=/opt/CascadingSDK

. $CASCADING_SDK_HOME/etc/setenv.sh
Expand Down
2 changes: 1 addition & 1 deletion modules/cascading/manifests/init.pp
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
# S3 can be slow at times hence a longer timeout
timeout => 1800,
unless => "ls /opt | grep CascadingSDK",
require => Package["openjdk-6-jdk"]
require => Package["openjdk-7-jdk"]
}

exec { "unpack_sdk" :
Expand Down
2 changes: 1 addition & 1 deletion modules/hadoop/files/hadoop-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
# remote nodes.

# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

# The jsvc implementation to use. Jsvc is required to run secure datanodes.
#export JSVC_HOME=${JSVC_HOME}
Expand Down
6 changes: 3 additions & 3 deletions modules/hadoop/files/hdfs-site-single.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<description>The actual number of replications can be specified when the file is created.</description>
</property>
<property>
<name>dfs.permissions</name>
<name>dfs.permissions.enabled</name>
<value>false</value>
<description>
If "true", enable permission checking in HDFS.
Expand All @@ -18,11 +18,11 @@
</description>
</property>
<property>
<name>dfs.data.dir</name>
<name>dfs.datanode.data.dir</name>
<value>/srv/hadoop/datanode</value>
</property>
<property>
<name>dfs.name.dir</name>
<name>dfs.namenode.name.dir</name>
<value>/srv/hadoop/namenode</value>
</property>
</configuration>
8 changes: 4 additions & 4 deletions modules/hadoop/files/hdfs-site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<value>2</value>
<description>The actual number of replications can be specified when the file is created.</description>
</property>
<property>
<name>dfs.permissions</name>
<name>dfs.permissions.enabled</name>
<value>false</value>
<description>
If "true", enable permission checking in HDFS.
Expand All @@ -18,11 +18,11 @@
</description>
</property>
<property>
<name>dfs.data.dir</name>
<name>dfs.datanode.data.dir</name>
<value>/srv/hadoop/datanode</value>
</property>
<property>
<name>dfs.name.dir</name>
<name>dfs.namenode.name.dir</name>
<value>/srv/hadoop/namenode</value>
</property>
</configuration>
4 changes: 2 additions & 2 deletions modules/hadoop/manifests/init.pp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
class hadoop($slaves_file = undef, $hdfs_site_file = undef) {

$hadoop_version = "2.6.0"
$hadoop_version = "2.7.2"
$hadoop_home = "/opt/hadoop-${hadoop_version}"
$hadoop_tarball = "hadoop-${hadoop_version}.tar.gz"
$hadoop_tarball_checksums = "${hadoop_tarball}.mds"
Expand Down Expand Up @@ -41,7 +41,7 @@
timeout => 1800,
path => $path,
creates => "/vagrant/$hadoop_tarball",
require => [ Package["openjdk-6-jdk"], Exec["download_grrr"]]
require => [ Package["openjdk-7-jdk"], Exec["download_grrr"]]
}

exec { "download_checksum":
Expand Down
12 changes: 6 additions & 6 deletions modules/hadoop/templates/hadoop-path.sh.erb
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
export HADOOP_HOME_WARN_SUPPRESS="true"
export HADOOP_HOME=<%=hadoop_home%>
export HADOOP_HOME=<%= @hadoop_home %>
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_CONF_DIR=<%=hadoop_conf_dir%>
export YARN_CONF_DIR=<%=hadoop_conf_dir%>
export HADOOP_CONF_DIR=<%= @hadoop_conf_dir %>
export YARN_CONF_DIR=<%= @hadoop_conf_dir %>
export PATH=$HADOOP_HOME/bin:$PATH
export YARN_LOG_DIR=<%=yarn_log_dir%>
export HADOOP_LOG_DIR=<%=hadoop_log_dir%>
export HADOOP_MAPRED_LOG_DIR=<%=mapred_log_dir%>
export YARN_LOG_DIR=<%= @yarn_log_dir %>
export HADOOP_LOG_DIR=<%= @hadoop_log_dir %>
export HADOOP_MAPRED_LOG_DIR=<%= @mapred_log_dir %>
2 changes: 1 addition & 1 deletion modules/hbase/files/hbase-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

# The java implementation to use. Java 1.6 required.
# export JAVA_HOME=/usr/java/jdk1.6.0/
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
# Extra Java CLASSPATH elements. Optional.
# export HBASE_CLASSPATH=

Expand Down
30 changes: 30 additions & 0 deletions modules/hbase/files/hbase-site-single.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master.local</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/srv/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored. </description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master.local:9000/hbase</value>
<description>The directory shared by RegionServers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>false</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
4 changes: 4 additions & 0 deletions modules/hbase/files/hbase-site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,8 @@
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
1 change: 1 addition & 0 deletions modules/hbase/files/regionservers-single
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
master.local
28 changes: 20 additions & 8 deletions modules/hbase/manifests/init.pp
Original file line number Diff line number Diff line change
@@ -1,8 +1,20 @@
class hbase {
$hbase_version = "0.98.13"
$hbase_platform = "hadoop2"
$hbase_home = "/opt/hbase-${hbase_version}-${hbase_platform}"
$hbase_tarball = "hbase-${hbase_version}-${hbase_platform}-bin.tar.gz"
class hbase($regionservers_file = undef, $hbase_site_file = undef) {
$hbase_version = "1.2.2"
$hbase_home = "/opt/hbase-${hbase_version}"
$hbase_tarball = "hbase-${hbase_version}-bin.tar.gz"

if $regionservers_file == undef {
$_regionservers_file = "puppet:///modules/hbase/regionservers"
}
else {
$_regionservers_file = $regionservers_file
}
if $hbase_site_file == undef {
$_hbase_site_file = "puppet:///modules/hbase/hbase-site.xml"
}
else {
$_hbase_site_file = $hbase_site_file
}

file { "/srv/zookeeper":
ensure => "directory"
Expand All @@ -13,7 +25,7 @@
timeout => 1800,
path => $path,
creates => "/vagrant/$hbase_tarball",
require => [ Package["openjdk-6-jdk"], Exec["download_grrr"]]
require => [ Package["openjdk-7-jdk"], Exec["download_grrr"]]
}

exec { "unpack_hbase" :
Expand All @@ -25,7 +37,7 @@

file {
"${hbase_home}/conf/regionservers":
source => "puppet:///modules/hbase/regionservers",
source => $_regionservers_file,
mode => 644,
owner => root,
group => root,
Expand All @@ -34,7 +46,7 @@

file {
"${hbase_home}/conf/hbase-site.xml":
source => "puppet:///modules/hbase/hbase-site.xml",
source => $_hbase_site_file,
mode => 644,
owner => root,
group => root,
Expand Down
2 changes: 1 addition & 1 deletion modules/hbase/templates/hbase-path.sh.erb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
export HBASE_HOME=<%=hbase_home%>
export HBASE_HOME=<%= @hbase_home %>
export HBASE_CONF_DIR=$HBASE_HOME/conf
export PATH=$HBASE_HOME/bin:$PATH
7 changes: 7 additions & 0 deletions modules/hive/files/prepare-hive.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env bash

. /etc/profile

export HDFS_USER=hdfs

su - $HDFS_USER -c "$HADOOP_PREFIX/bin/hadoop fs -mkdir -p /tmp /user/hive/warehouse; $HADOOP_PREFIX/bin/hadoop fs -chmod g+w /tmp /user/hive/warehouse"
Loading