This repository contains is a couple of examples of using Apache Spark to process social media data (JSON) into an abstract 'Interaction' we want to analyse.
The data used in this example came from streams of Facebook data provided by Datasift. While we cannot redistribute the data we demonstrated, you can acquire it yourself using Datasift for around $5 a day.
If you'd like to import and use this project from Eclipse, make sure you have SBT 0.13+ installed and run the following:
sbt eclipse
This will generate the Eclipse project metadata and you can use File -> Import to load it into your workspace.
You can also submit this job to Apache Spark as a JAR file using sbt assembly
to build the project and spark-submit
to run the Job on your existing cluster.