Skip to content

Cylc Refactor Proposal (2014)

Matt Shin edited this page Jun 12, 2018 · 47 revisions

NOTE this page is out of date - many issues have been addressed since 2014.

Issues

Load on Server

Scheduler currently becomes inefficient when a suite has several thousand live task proxies (this generally corresponds to several thousand tasks in a single cycle point of the graph, with more if runahead is allowed and there are inter-cycle dependencies).

Reload of a large suite can take a long time, and can use up a lot of memory. (Stop and restart is faster. How come?)

Scheduler uses a significant amount of CPU, even when it should be idle. E.g. When a large suite is stopping, waiting for a single job to complete, scheduler appears to continue to consume CPU.

  • Is it still the case?
  • less so: cylc-6 uses 6-7 times less CPU than cylc-5 in busy suites.(Ben Fitzpatrick)
  • When the suite is idle, it should use almost no CPU.

See also #107, #108, #184, #788, #987.

Scheduling Algorithm

Dependency matching is currently one of the main bottlenecks. Each pass through the main loop every task proxy registers its completed outputs with the "broker", then every task proxy tries to satisfy its prerequisites with the broker. Then the broker dumps everything and we start again.

Output strings are determined by the graph. They contain the generating task's ID to ensure matching prerequisites can only be satisfied by that task, but then we do the matching indiscriminately: each task queries every other task (albeit via the broker, so this doesn't scale too badly).

Instead each task could query just the specific tasks that are known (from the graph) to generate the outputs its needs. To do this, task proxies could hold a dict that associates prerequisites with the IDs of outputting tasks(or perhaps the reverse); and the task pool could be a dict of task proxies indexed by task ID, so that we don't have to iterate through the list to find specific tasks.

Event-driven Scheduling

Currently we continually execute the main loop in the suite daemon even if nothing is happening, although some flags are set, e.g. to indicate a task changed state, so we avoid much of the work if it's not needed.

An event driven model would be more efficient: the main loop should simply look for 1) pending job messages, 2) pending user commands, and 3) wall clock (if there are clock triggered tasks). If there is no event since last time, sleep.

Moreover (see reference to replacing the indiscriminate task pool model above) we should be able to determine exactly which tasks are affected by which events and thereby avoid a lot of iteration through the pool.

Data Model and Persistent Layer

Current historical-persistent layer added as an after thought. Hence it does not completely document the suite throughout its lifetime.

  • Environment and changes.
  • Configuration and changes.
  • Runtime state and changes.
  • User interactions.
  • Changes to other items in the suite?

SQLite database can be locked by a reader and can cause the suite to die.

It is not clear to users what suite items to backup in order to guarantee a good restart. Do we need all of these?

  • Suite environment.
  • Suite definition.
  • State file.
  • Suite runtime database.
  • Job logs and status files.
  • Other items, which may be modified in the interim.

Current data model is difficult to serialise, because it mixes everything together:

  • Site/user configuration.
  • Runtime options.
  • Suite configuration.
  • Runtime states.
  • Functional logic.

(This also causes much unnecessary getting and setting of data throughout the logic.)

Task proxies are generated classes with various layers of inheritance. This is undesirable and restricts names that can be given to tasks.

Runtime files are not in one place.

  • It is not obvious to users what they should housekeep and/or archive.

See also #372, #421, #423, #705, #846, #864, #975.

Communication Layer

API for job message, user query and command. E.g.: Current use of Pyro limits what we can do:

  • Only single passphrase authentication.
  • Once you are in, you can do everything.
  • Object RPC instead of RESTful API design.
  • There is no clear API to:
    • send job messages (except via cylc command).
    • send user queries. (A query does not change the suite state.)
    • send user commands. (A command asks the suite to perform an action.)
  • Unable to use mainstream technology built around the HTTP and other more common protocols.
    • SMTP would be a useful protocol to support, as just about any load-balancing system on any site is able send emails out.

It would also be good to have proper information coming from the server for client-side commands like cylc reload, such as a completion message or information about what changes were made.

See also #72, #124, #126, #537, #969, #970, #1186, #1265.

Job Submission and Management

Inefficiency host selection via rose host-select.

  • Multiple SSH commands to multiple hosts or login nodes for every job.
  • While insignificant, time adds up when we start running large number of jobs at the same time, e.g. large ensembles.

Multiple SSH and almost identical commands to submit jobs to queueing system.

  • This may create unnecessary loads to suite hosts and job hosts.

See #1505.

Submission error output, currently goes to log/suite/err, can be lost in the noise. Users are often puzzled when they have a submission failure.

  • Similar issue with event hooks. DONE

It is not easy to archive a single cycle of log files due to the log/job/$TASK.$CYCLE.$SUBMIT_NUM* naming convention.

  • In addition *.1 to *.8 are the traditional file extensions for Unix manual pages.
  • It is not easy to compare logs between suites.
  • Submit number not document in job script.
  • Users would find it easier with a hierarchy based on log/job/$CYCLE/.

Ditto for items in work/$TASK.$CYCLE/.

  • Users would find it easier with a hierarchy based on work/$CYCLE/$TASK/. DONE

Rose Integration

Rose and other 3rd party tool-kits and frameworks.

Rose provides these functionalities, which should probably be part of Cylc?

  • Suite installation (rose suite-run).
  • Suite host selection (rose suite-run, etc).
    • In the future, provide a way to migrate a suite to a different host, e.g. if current host is scheduled for a reboot in the middle of a run.
  • Job log management and event handling (rose suite-hook).
  • Suite clean (rose suite-clean).
  • Locate suites on multiple hosts (rose suite-scan).
    • cylc gsummary works, but cylc scan doesn't.
    • Other relevant cylc commands, e.g. gcylc and cylc stop should do the same.
  • Browser of suite logs via HTTP (Rose Bush).
    • Need a new name. What about cylc moth for (Monitor Tasks via HTTP?)

Users are unable to call Rose functionalities via cylc gui.

  • Restart or reload suite, or re-trigger a task with or without reinstalling configurations for suites and/or applications.
  • Launch rose config-edit.
  • Launch Rose Bush.

Users have to hard wire Rose environment in job scripts. See also #511.

Actions

Following discussions, we agreed the following:

  1. Investigate how to improve suite runtime performance (CPU usage, memory usage, etc):

    • Activity can start as soon as possible.
    • Implement performance quick wins. (2014-Q3?)
    • Propose a new and more scalable architecture for the future. (2014-Q3/4?)
      • More event driven.
      • Boss-worker processes. DONE (#1012)
      • Functional architecture.
  2. Propose new data model and data persistent layer. (2014-Q3?)

    • Data model will be able to represent and fully document the runtime of a suite.
      • Task and job states.
      • Change to suite.
      • User commands.
    • Data model will be easy to serialise.
    • Data model will be easy to pass between functions.
    • Data model will be memory efficient.
    • Persistent layer will be friendly to write and query.
  3. Propose new communication layer API. (2014-Q3/4 after data model activity?)

    • RESTful HTTP API, which will allow us to use common web technologies.
    • Job message via HTTP POST. Unique one-time authentication token for each job.
    • For those sites that do not allow outward communications from jobs to suites.
      • Support job message via email.
      • Support HTTP and SMTP proxies.
    • User commands via HTTP POST. (Tell the suite to perform an action.)
      • Require authentication. Use a system based on public/private key pairs?
    • User queries via HTTP GET. (Read-only, no change to suite.)
  4. Propose log/job/ and work/ directory restructure before cylc 6. (2014-Q3)

  5. Propose changes to job submission log and event hook log locations. (2014-Q3/4)

  6. Propose changes to migrate rose suite-hook and Rose Bush functionalities into cylc. (2014-Q4)

    • New configurations to ask the suite to pull job logs back from job hosts.
    • New configurations to send email notification on events.
    • New configurations to shut down on events.
    • Suite to populate job log database table.
    • Rose Bush -> cylc moth? (MOnitor Tasks via HTTP?)
    • Issue opened for hook functionalities at #992.
    • Issue opened for log (and possibly work) prune functionalities at #994.
  7. Propose changes to allow cylc commands to look for suite in configured suite hosts. (2014-Q3/Q4)

    • N.B. cylc gsummary already does this.
    • Issue opened at #1050.
    • See also #578.
  8. Migrate rose suite-run and rose suite-clean functionalities. (2014-Q4/2015?)