Skip to content

Commit

Permalink
Merge pull request #15 from CompEpigen/dev
Browse files Browse the repository at this point in the history
prepare for v0.1.3
  • Loading branch information
KerstenBreuer authored Aug 8, 2019
2 parents 1550f1e + a5050a3 commit 2bd146b
Show file tree
Hide file tree
Showing 5 changed files with 106 additions and 38 deletions.
131 changes: 99 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Background and Scope:
The Common Workflow Language (CWL) allows to wrap and link up bioinformatic software in a standardized and portable way. However, setting up and operating a CWL-based workflow management system can be a labor-intensive challenge for many data-driven laboratories. To this end, we developed CWLab: a framework for simplified, graphical deployment of CWL.

CWLab allows life-science researchers with all levels of computational proficiency to create, execute and monitor jobs for CWL-wrapped tools and workflows. Input parameters for large sample batches are specified using a simple HTML form or a spreadsheet and are automatically validated. The integrated web server allows to remotely control the execution on clusters as well as single workstations. Moreover, automatic infrastructure provisioning and scaling for OpenStack-based clouds is being implemented. CWLab can also be used as a local desktop application that supports Linux, MacOS, and Windows by leveraging Docker containerization. Our Python-based framework is easy to set up and, via a flexible API, it can be integrated with any CWL runner and adapted to custom software environments.
CWLab allows life-science researchers with all levels of computational proficiency to create, execute and monitor jobs for CWL-wrapped tools and workflows. Input parameters for large sample batches are specified using a simple HTML form or a spreadsheet and are automatically validated. The integrated webserver allows to remotely control the execution on clusters as well as single workstations. Moreover, automatic infrastructure provisioning and scaling for OpenStack-based clouds is being implemented. CWLab can also be used as a local desktop application that supports Linux, MacOS, and Windows by leveraging Docker containerization. Our Python-based framework is easy to set up and, via a flexible API, it can be integrated with any CWL runner and adapted to custom software environments.

With CWLab, we would like to hide the complexity of workflow management so that scientific users can focus on their data analyses. This might promote the adoption of CWL in multi-professional life-science laboratories.

Expand All @@ -15,17 +15,42 @@ Installation can be done using pip:

Please see the section "Configuration" for a discussion of available options.

Start the web server with your custom configuration (or leave out the `--config` flag to use the default one):
Start the webserver with your custom configuration (or leave out the `--config` flag to use the default one):
`cwlab up --config config.yaml`

If you like to make use of containers for dependency management, you need to install [Docker](https://docs.docker.com/install/) or a Docker-compatible containerization solution like [singularity](https://singularity.lbl.gov/) or [udocker](https://github.com/indigo-dc/udocker). To run on Windows or MacOs, please install the dedicated docker versions: [Docker for Windows](https://docs.docker.com/docker-for-windows/), [Docker for Mac](https://docs.docker.com/docker-for-mac/)

The usage of the web interface should be self-explanatory with build-in instruction. The following section gives an overview of the basic usage scenario.

## Supported Systems:

CWLab is written in platform-agnostic python and can therefore be executed on:
- **Linux**
- **MacOs**
- **Windows**\*

Any CWL runner that has a command-line interface can be integrated into CWLab in order to execute CWL workflows or tool-wrappers, such as:
- **cwltool** (the reference implementation) - https://github.com/common-workflow-language/cwltool
- **Toil** (UCSC) - https://github.com/DataBiosphere/toil
- **Cromwell** (Broad Institute) - https://github.com/broadinstitute/cromwell
- **Reana** (CERN) - https://reana.readthedocs.io/en/latest/index.html
- **CWLEXEC** (IBM) - https://github.com/IBMSpectrumComputing/cwlexec
(Please find a constantly updated list at: https://www.commonwl.org/#Implementations)

therefore, CWLab can be used on any infrastructure supported by these CWL-runners, including:
- **single workstations**
- **HPC clusters** (PBS, LSF, slurm, ...)
- **clouds** (AWS, GCP, Azure, OpenStack)

\***Please Note:**
Execution on Windows is only supported by cwltool which talks to docker for windows. therefore, CWL-wrapped tools and workflows which where originally designed for Linux/MacOs can be executed on Windows with a graphical interface provided by CWLab.

## Usage:

### Connect to the web interface:
Open a modern browser of your choice like Chrome, Firefox, Safari, or Edge (Internet Explorer might be partially incompatible).

Type in the URL of your web server. The URL depends on your configuration:
Type in the URL of your webserver. The URL depends on your configuration:

- If the webserver is running on the same machine and uses port 5000 is used (this is the default), type: `https://localhost:5000/`
- If CWLab is running on a remote machine in the same network, type in the machine's IP address and the used port. For instance, if the IP address is 172.22.0.1 and port 5000 is used:`https://172.22.0.1:5000/`
Expand All @@ -52,7 +77,7 @@ The workflow will be automatically validated:
### Create a new Job:
To run a workflow or tool with your data, you have to create a new job. One job may contain multiple runs (for instance multiple samples or conditions). CWLab will automatically present you a list of needed input parameters. For each parameter, you can choose whether to specify it globally (all runs of a job will get the same value) or per run.

- Click on the button "Create New Job" in the top bar and select the desired CWL document in the side bar
- Click on the button "Create New Job" in the top bar and select the desired CWL document in the sidebar
- Specify a descriptive job name (the job ID will be composed of the date, time, and the name)
- If the job shall contain multiple runs toggle the "runs per job" switch, then:
- Specify run names as a comma-separated list in the dedicated text field
Expand Down Expand Up @@ -82,7 +107,7 @@ This is an example screenshot for creating a job for an ATAC-seq workflow:

### Job execution:

- Click on "Job Execution & Results" in the top bar and choose the job of interest in the side bar
- Click on "Job Execution & Results" in the top bar and choose the job of interest in the sidebar
- Select the runs you want to start
- Select an execution profile (see the "Configuration" for details) and press "start"
- The execution status will be displayed in the run-list
Expand All @@ -97,10 +122,10 @@ An example screenshot of the execution interface:
![execution screenshot](https://github.com/CompEpigen/CWLab/blob/master/screenshots/execution.png?raw=true)

## Configuration:
CWLab is a highly versatile package and makes almost no assumptions on your hard- and software environment used for the execution of CWL. To adapt it to your system and use case, a set of configuration option is available:
CWLab is a highly versatile package and makes almost no assumptions on your hard- and software environment used for the execution of CWL. To adapt it to your system and use case, a set of configuration options is available:

- General configs, including:
- web server (hosting IP address and port, remotely or locally available, login protected or not)
- webserver (hosting IP address and port, remotely or locally available, login protected or not)
- paths of working directories
- Execution profiles:
This flexible API allows you to adapt CWLab to your local software environment and to integrate a CWL runner of your choice (such as Cwltool, Toil, or Cromwell).
Expand All @@ -115,10 +140,10 @@ To get an example config file, run the following command:
### General Configs:

- **WEB_SERVER_HOST**:
Specify the host or IP address on which the web server shall run. Use `localhost` for local usage on your machine only. Use `0.0.0.0` to allow remote accessibility by other machines in the same network.
Specify the host or IP address on which the webserver shall run. Use `localhost` for local usage on your machine only. Use `0.0.0.0` to allow remote accessibility by other machines in the same network.
*Default*: `localhost`
- **WEB_SERVER_PORT**:
Specify the port used by the web server.
Specify the port used by the webserver.
*Default*: 5000

- **TEMP_DIR**:
Expand All @@ -142,13 +167,15 @@ To get an example config file, run the following command:
*Default*: False

### Exec Profiles:
This is where you configure how to execute cwl jobs on your system. A profile consists of four steps: pre_exec, exec, eval, and post_exec (only exec required, the rest is optional). For each step you can specify commands that are executed in bash or cmd terminal.
This is where you configure how to execute cwl jobs on your system. A profile consists of four steps: pre_exec, exec, eval, and post_exec (only exec required, the rest is optional). For each step, you can specify commands that are executed in bash or cmd terminal.

You can define multiple execution profile as shown in the config example below. This allows frontend users to choose between different execution options (e.g. using different CWL runners, different dependency management systems, or even choose a between multiple available batch execution infrastructures like lsf, pbs, ...). For each execution profile, following configuration parameters are available (but only **shell** and **exec** is required):

- **shell**:
Specify which shell to use. For Linux or MacOS use `bash`. For Windows, use `cmd`.
Specify which shell to use. For Linux or MacOS use `bash`. For Windows, use `powershell`.
*Required*.
- **max_retries**:
Specify how many times the execution (all steps) is retried before marking a run as failed.
- **timeout**:
For each step in the execution profile, you can set a timeout limit.
*Default*:
Expand Down Expand Up @@ -182,13 +209,19 @@ You can define multiple execution profile as shown in the config example below.
- ``OUTPUT_DIR`` (the path of the run-specific output directory)
- ``LOG_FILE`` (the path of the log file that should receive the stdout and stderr of CWL runner)
- ``SUCCESS`` (if set to `False` the run will be marked as failed and terminated)
- The four steps will be executed in the same shell session and therefore can be treated as one connected script. (Between the steps, CWLab communicates the status to the database allowing the User to get status notifications via the front end).
- ``PYTHON_PATH`` (the path to the python interpreter used to run CWLab)
- The four steps will be executed in the same shell session and therefore can be treated as one connected script. (Between the steps, CWLab communicates the status to the database allowing the user to get status notifications via the front end).
- Thus you may define your own variables that will also be available in all downstream steps.
- At the end of each step. The exit code is checked. If it is non-zero, the run will be marked as failed. Please note, if a step consists of multiple commands and an intermediate command fails, this will not be recognized by CWLab as long as the final command of the step will succeed. To manually communicate failure to CWLab, please set the `SUCCESS` variable to `False`.
- The steps are executed using pexpect (https://pexpect.readthedocs.io/en/stable/overview.html), this allows you also connect to a remote infrastructure via ssh (recommended to use an ssh key). Please be aware that the path of files or directories specified in the input parameter YAML will not be adapted to the new host. We are working on solutions to achieve an automated path correction and/or upload functionality if the execution host is not the CWLab server host.
- On Windows, please be aware that each code block (contained in ``{...}``) has to be in one line.

### Example comfiguration file:
### Example configuration files:

Below, you can find example configurations for local execution of CWL workflows or tools with cwltool.

#### Linux / MacOs:

```yaml
WEB_SERVER_HOST: localhost
WEB_SERVER_PORT: 5000
Expand All @@ -202,28 +235,62 @@ INPUT_DIR: '/home/cwlab_user/cwlab/input'
DB_DIR: '/home/cwlab_user/cwlab/db'
EXEC_PROFILES:
cwltool_local:
shell: bash
max_retries: 2
timeout:
pre_exec: 120
exec: 86400
eval: 120
post_exec: 120
exec: |
cwltool --outdir "${OUTPUT_DIR}" "${CWL}" "${RUN_YAML}" \
>> "${LOG_FILE}" 2>&1
eval: |
LAST_LINE=$(tail -n 1 ${LOG_FILE})
if [[ "${LAST_LINE}" == *"Final process status is success"* ]]
then
SUCCESS=True
else
SUCCESS=False
ERR_MESSAGE="cwltool failed - ${LAST_LINE}"
fi
```

#### Windows:

```yaml
WEB_SERVER_HOST: localhost
WEB_SERVER_PORT: 5000
DEBUG: False
cwltool_local:
shell: bash
timeout:
pre_exec: 120
exec: 86400
eval: 120
post_exec: 120
exec: |
cwltool --outdir "${OUTPUT_DIR}" "${CWL}" "${RUN_YAML}" \
>> "${LOG_FILE}" 2>&1
eval: |
LAST_LINE=$(tail -n 1 ${LOG_FILE})
if [[ "${LAST_LINE}" == *"Final process status is success"* ]]
then
SUCCESS=True
else
SUCCESS=False
ERR_MESSAGE="cwltool failed - ${LAST_LINE}"
fi
TEMP_DIR: '/home/cwlab_user/cwlab/temp'
CWL_DIR: '/home/cwlab_user/cwlab/cwl'
EXEC_DIR: '/home/cwlab_user/cwlab/exec'
INPUT_DIR: '/home/cwlab_user/cwlab/input'
DB_DIR: '/home/cwlab_user/cwlab/db'
EXEC_PROFILES:
cwltool_windows:
shell: powershell
max_retries: 2
timeout:
pre_exec: 120
exec: 86400
eval: 120
post_exec: 120
exec: |
. "${PYTHON_PATH}" -m cwltool --debug --default-container ubuntu:16.04 --outdir "${OUTPUT_DIR}" "${CWL}" "${RUN_YAML}" > "${LOG_FILE}" 2>&1
eval: |
$LAST_LINES = (Get-Content -Tail 2 "${LOG_FILE}")
if ($LAST_LINES.Contains("Final process status is success")){$SUCCESS="True"}
else {$SUCCESS="False"; $ERR_MESSAGE = "cwltool failed - ${LAST_LINE}"}
```


## Documentation:

**Please note: A much more detailed documentation is on the way. In the meantime, please notify us if you have any questions (see the "Contact and Contribution" section). We are happy to help.**
Expand Down
2 changes: 1 addition & 1 deletion cwlab/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

from __future__ import absolute_import

__version__ = "0.1.2"
__version__ = "0.1.3"

import os
from flask import Flask
Expand Down
4 changes: 2 additions & 2 deletions cwlab/web_app/job_exec.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,12 +129,12 @@ def start_exec(): # returns all parmeter and its default mode (global/job spe
if len(started_runs) > 0:
messages.append({
"type":"success",
"text":"Successfully started execution for jobs: " + ", ".join(started_runs)
"text":"Successfully started execution for runs: " + ", ".join(started_runs)
})
if len(already_running_runs) > 0:
messages.append({
"type":"warning",
"text":"Following jobs are already running: " + ", ".join(already_running_runs) + ". To restart them, use terminate them first."
"text":"Following runs are already running: " + ", ".join(already_running_runs) + ". To restart them, terminate them first."
})
except SystemExit as e:
messages.append( {
Expand Down
5 changes: 3 additions & 2 deletions cwlab/xls2cwl_job/match_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,12 +144,13 @@ def get_type_matched_param_values( param_values, configs, validate_paths=True, s
param_value = param_values[param_name]
if configs[param_name]["type"] == "helper":
continue
if param_values[param_name][0] == "" and len(param_values[param_name]) == 1:
if len(param_value) == 0 or (len(param_values[param_name]) == 1 and param_value[0] == ""):
if configs[param_name]["null_allowed"]:
param_values_type_matched[param_name] = None
continue
else:
sys.exit( print_pref + " parameter \"" + param_name + "\" failes type matching: " +
"parameter was empty (\"\") but null is not allowed.")
"parameter was empty but null is not allowed.")
try:
param_values_type_matched[param_name] = match_type(param_name, param_values, configs,
validate_paths, search_paths, search_subdirs, input_dir)
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

setup(
name='cwlab',
version='0.1.2',
version='0.1.3',
description='A platform-agnostic, cloud-ready framework for simplified deployment of the Common Workflow Language using a graphical web interface',
long_description=open(README).read(),
long_description_content_type="text/x-rst",
Expand Down

0 comments on commit 2bd146b

Please sign in to comment.