Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Spark to the cluster #18

Open
wants to merge 41 commits into
base: 2.7
Choose a base branch
from
Open

add Spark to the cluster #18

wants to merge 41 commits into from

Conversation

gregbaker
Copy link

This adds Spark 1.4.0 to the cluster setup. I have tested it a little: spark jobs can access HDFS files (as hdfs://master.local:9000/home/vagrant/...) and jobs can be sent out to the cluster with a command like this:

spark-submit --master yarn-cluster ...

The download required during the provisioning is about 240MB: I don't know if that's enough to make you think that leaving the spark manifest commented out in manifests/master-single.pp is wise.

I haven't updated the README: again, I'm not sure if it's worth advertising there.

@gregbaker
Copy link
Author

I have continued to add changes to my fork: fiddled with the HDFS replication (so files aren't available on every node, which is realistic) and updated version of the tools (to Hadoop 2.7.1 and other current versions). Certainly feel free to cherry-pick as necessary if these aren't considered relevant to this project's goals.

@tristanreid
Copy link

Looks cool! I may fork off this to add parquet-tools (https://github.com/Parquet/parquet-mr/tree/master/parquet-tools)

@tristanreid
Copy link

Greg, this is really great! One thing: hbase has moved from 1.1.1->1.1.2. The build only works for me if I make that change in modules/hbase/manifests/init.pp and modules/phoenix/manifests/init.pp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants