Skip to content

Workflow Composition

wongiseng edited this page Oct 9, 2014 · 1 revision

##Workflow Composition

In Ducktape we compose experiments by defining collections of modules together with their inputs, and how they are chained together. We use YAML to capture this experiment definition.

The structure of the YAML first describe the Workflow name and description, followed by list of modules. Each modules need to provide its name, source and inputs. Inputs to modules can be either raw YAML values (strings, doubles, ints, bools or lists of these) or references to the outputs of other modules.

Reference

A reference is when one module uses as one of its inputs the output of another module.

Sweep

A sweep occurs when the input to a modules (either by reference or raw) is a list of values, where the input specifies a single value which matches each of the entries in the list. When a sweep is encountered, the execution branches: the module is executed once for each value in the list. If a module contains multiple sweeps, the exeuction is branched for each value of the cartesian product of the individual sweeps.

Any module dependent on the output of a module which has been branched is executed once for each branch. If another sweep is encountered downstream of an existing sweep, a new branch is created for each value of the second sweep.

Example workflows excerpts from Affiliation Prediction

####Workflow workflow: name: "Affiliation Experiment Test" modules:

   - module:
      name: RDFDataSet
      source: org.data2semantics.exp.modules.RDFDataSetModule
      inputs:
         filename: "input.rdf"
         mimetype: "text/n3"
      
   - module:
      name: AffiliationDataSet
      source: org.data2semantics.exp.modules.AffiliationDataSetModule
      inputs: 
         dataset:
            reference: RDFDataSet.dataset
         minSize: 0
        [...]
            
   - module: 
      name: RDFWLSubTreeKernel
      source: org.data2semantics.exp.modules.RDFWLSubTreeKernelModule
      inputs:
         iterations: [0, 2, 4]
         depth: [1, 2]
         dataset:
            reference: RDFDataSet.dataset
         instances:
            reference: AffiliationDataSet.instances
         blacklist:
            reference: AffiliationDataSet.blacklist
            
            
 [...]