Benefault

A way for task preemption in Big data analytics platform

What we have done

a simple shell script for monitoring node's metadata (e.g. disk access, network Tx Rx etc) in a cluster
read and write for chekcpointing data (note: checkpointRead is private in spark, we need to package function into org.apache.spark)

The performance gain is 15-30%

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
decisionTree_spark1.6.1		decisionTree_spark1.6.1
groupbytest		groupbytest
monitorScripts		monitorScripts
rddcp		rddcp
sorting		sorting
word-chekcpoint-readWrite		word-chekcpoint-readWrite
wordcount		wordcount
JCudaDFVectorAdd.scala		JCudaDFVectorAdd.scala
PhaseCount.scala		PhaseCount.scala
README.md		README.md
SortingCheckpoint.scala		SortingCheckpoint.scala
SparkPageRank_Checkpointing.scala		SparkPageRank_Checkpointing.scala
Yarn-preempt.md		Yarn-preempt.md
checkpoint.scala		checkpoint.scala
checkpoint_example.scala		checkpoint_example.scala
checkpoint_overhead.png		checkpoint_overhead.png
conf.txt		conf.txt
spark-checkpointing-master.zip		spark-checkpointing-master.zip
sparkcheckpointing.txt		sparkcheckpointing.txt
systemcall.scala		systemcall.scala
wordcount.scala		wordcount.scala