Skip to content

Latest commit

 

History

History
48 lines (36 loc) · 2.51 KB

README.md

File metadata and controls

48 lines (36 loc) · 2.51 KB

Spark Maven template

The Spark Maven template image serves as a base image to build your own Maven application to run on a Spark cluster. See big-data-europe/docker-spark README for a description how to setup a Spark cluster.

Package your application using Maven

You can build and launch your Maven application on a Spark cluster by extending this image with your sources. The template uses Maven as build tool, so make sure you have a pom.xml file for your application specifying all the dependencies.

The Maven package command must create an assembly JAR (or 'uber' JAR) containing your code and its dependencies. Spark and Hadoop dependencies should be listes as provided. The Maven shade plugin provides a plugin to build such assembly JARs.

Extending the Spark Maven template with your application

Steps to extend the Spark Maven template

  1. Create a Dockerfile in the root folder of your project (which also contains a pom.xml)
  2. Extend the Spark Maven template Docker image
  3. Configure the following environment variables (unless the default value satisfies):
  • SPARK_MASTER_NAME (default: spark-master)
  • SPARK_MASTER_PORT (default: 7077)
  • SPARK_APPLICATION_JAR_NAME (default: application-1.0)
  • SPARK_APPLICATION_MAIN_CLASS (default: my.main.Application)
  • SPARK_APPLICATION_ARGS (default: "")
  1. Build and run the image
docker build --rm=true -t bde/spark-app .
docker run --name my-spark-app --link spark-master:spark-master -d bde/spark-app

The sources in the project folder will be automatically added to /usr/src/app if you directly extend the Spark Maven template image. Otherwise you will have to add and package the sources by yourself in your Dockerfile with the commands:

COPY . /usr/src/app
RUN cd /usr/src/app \
    && mvn clean package

If you overwrite the template's CMD in your Dockerfile, make sure to execute the /template.sh script at the end.

Example Dockerfile

FROM bde2020/spark-maven-template:3.3.0-hadoop3.3

MAINTAINER Erika Pauwels <[email protected]>
MAINTAINER Gezim Sejdiu <[email protected]>

ENV SPARK_APPLICATION_JAR_NAME my-app-1.0-SNAPSHOT-with-dependencies
ENV SPARK_APPLICATION_MAIN_CLASS eu.bde.my.Application
ENV SPARK_APPLICATION_ARGS "foo bar baz"

Example application

See big-data-europe/demo-spark-sensor-data.