Skip to content
Thomas Renner edited this page Feb 5, 2016 · 1 revision

Design

Goal is to design a time based workload generator for Apache YARN.

Job definition

We need a file, e.g. XML, that stores different jobs including their characteristic. Should cover all parameters known from submitting a Flink job to YARN: path to framework bin folder, path to job jar including arguments (e.g. data input/output path, number of containers, memory, cores)

Schedule plan

We need a way to express the schedule, like: <JobX, StartTime> (e.g. (JobA, 0sec); (JobB, 10sec);)

Runner

A component that enables time based job submission to a YARN cluster. Idea: Store schedule plan in a time based ordered array. After a submit we need to remember the applicationId for the LogCollector.

LogCollector

Collects the logs and results of all runs. Getting this information is possible with the yarn applicationId

Clone this wiki locally