HAMAKE

Introduction

Most non-trivial data processing scenarios with Hadoop typically require more than one MapReduce job. Usually such processing is data-driven, with the data funneled through a sequence of jobs. The processing model could be presented in terms of dataflow programming. It could be expressed as a directed graph, with datasets as nodes. Each edge indicates a dependency between two or more datasets and is associated with a processing instruction (Hadoop MapReduce job, PIG Latin script or an external command), which produces one dataset from the others. Using fuzzy timestamps as a way to detect when a dataset needs to be updated, we can calculate a sequence in which the tasks need to be executed to bring all datasets up to date. Jobs for updating independent datasets could be executed concurrently, taking advantage of your Hadoop cluster's full capacity. The dependency graph may even contain cycles, leading to dependency loops which could be resolved using dataset versioning.

These ideas inspired the creation of HAMAKE utility. We tried emphasizing data and allowing the developer to express one's goals in terms of dataflow (versus workflow). Data dependency graph is expressed using just two data flow instructions: fold and foreach providing a clear processing model, similar to MapReduce, but on a dataset level. Another design goal was to create a simple to use utility that developers can start using right away without complex installation or extensive learning.

Key Features

Lightweight utility - no need for complex installation
Based on dataflow programming model
Easy learning curve.
Supports Amazon Elastic MapReduce
Allows to run MapReduce jobs as well as PIG Latin scripts

Installation

To install Hamake simply copy hamake-x.x.jar to the directory of your choice and make sure that hadoop command is in your $PATH variable. We also provide RPM and DEB packages.

Name		Name	Last commit message	Last commit date
Latest commit History 286 Commits
doc		doc
etc		etc
ivy		ivy
lib		lib
src		src
test		test
.project		.project
ChangeLog.txt		ChangeLog.txt
README.md		README.md
build.properties		build.properties
build.xml		build.xml
ivy.xml		ivy.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAMAKE

Introduction

Key Features

Installation

Documentation

About

Releases

Packages

Languages

codeminders/hamake

Folders and files

Latest commit

History

Repository files navigation

HAMAKE

Introduction

Key Features

Installation

Documentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages