Skip to content

Vision for Cylc beyond 2018 2019 Priorities

Oliver Sanders edited this page Jun 13, 2018 · 4 revisions

Vision for Cylc beyond 2018/2019 Priorities

The top priorities in 2018/2019 are to:

  • Migrate GUIs away from PyGTK to a suitable framework. E.g. one based on web technology.
    • Reduce development overhead. PyGTK is very hard compared to HTML-based technology.
    • Allow us to make use of a huge selection of off-the-shelf technology.
    • Access your suites via a web browser.
  • Migrate from Python 2 to Python 3. Drop Python 2 support before its end of life.
  • Migrate rose suite-run and rose bush functionalities to Cylc.
    • Rose Bush will be renamed Cylc Review.
    • Suite host selection. Suite host health check.
    • Suite installation on task host.
    • Suite directory clean service.
    • Suite directory installation.
    • Suite configuration header processing (Jinja2 constants) and start up environment.
    • Suite discovery meta-data.
    • Backward compatibility with rose-suite.conf and rose-suite.info?

This page raises some issues we should attempt to solve beyond the current top priorities. Where do we want to take Cylc beyond 2020? Amongst a sea of other work flow tools?

Key Issues

Dynamic work flow

There is a requirement to write complex dynamic work flow that may require changes during its life time. E.g. A suite may be required to generate different products during its life time, and the products may require modification to various parts of the work flow graph.

It is currently very hard to do this, as the configuration file is really a static data structure. Users rely on wrappers or Jinja2 templates to generate the initial suite configuration. The logic can already be very difficult to read or maintain.

Programmatic changes during run time is also awkward to control, as it is currently done by editing the static configuration file and then running a reload/restart.

Modular work flow

As for any application beyond a certain size, there is a requirement to improve modularity of large work flow. For example, we need the ability to:

  • Separate a large work flow in smaller logical modules, with a clear interface to connect them.
  • Re-use common sub work flows.
  • Cycle/iterate modules in dimensions beyond a single global date-time/integer.
  • Cycle/iterate tasks in a scope local to a module.
  • Import different modules based on upstream or external events.
  • Pluggable sub work flows for portability, e.g. research to production, inter-compute-platform, inter-site, etc.

Workflow Representation

Some workflows have now become too large to represent with cylc graph. As part of the requirement for modularity the representation of workflows containing sub-workflows should simplify large graphs enabling them to be represented in a more human friendly manner.

Small tasks in parallel

In the current system, an ideal task is normally defined as units that require a substantial amount of computing resource on a cluster. Each task is implemented as a job submitted to a batch scheduler (or workload manager).

It is natural for users to define their tasks as logical units, even though each unit may require little computing resources. Therefore, it should be possible to set up the run time of a logical group of small tasks (a sub-graph or a family of tasks) to submit as a single unit in the batch environment, but still regard them as individual tasks in the work flow.

In addition, users are increasingly comfortable with writing their task logic in Python. The current system only support configuring task logic as Bash script. It should be more efficient if it is able to configure and launch task logic written in Python, without having to go through a Bash wrapper.

Other issues:

  • How do we avoid flooding the file system with lots of files in a work flow with many small tasks?
  • How do we handle communication from lots of small tasks?

Flow of data

There is a requirement to represent tasks as consumers of inputs and emitters of outputs. Ideally, the system should be able to calculate dependency by data flow.

If a downstream task is dependent on the outputs of a branched work flow, the system should allow the task to locate its inputs in a straightforward way.

Is it desirable to support message passing between tasks? (Beyond fixed path I/O to disks.)

Maintenance of code base - need better architecture.

Separate code base into logical units:

  • A kernel to run a work flow. The configuration of the work flow in a data structure. The state of the work flow is a data structure.
  • A shell for user to configure or define the work flow, and to interact with the kernel.
  • Client-server API and logic.
  • Job management API and logic.

Misc

Opportunities and challenges

  • Cloud platforms.
  • Containers.
  • Dask and similar environments.
  • Other work flow tools.
  • Object file system.

Other improvements

Database support improvement:

  • Support other RDBMS?
  • No SQL?

Logging improvement.