Skip to content
This repository has been archived by the owner on Aug 9, 2023. It is now read-only.

Commit

Permalink
Update Step Functions Example (#79)
Browse files Browse the repository at this point in the history
Bug Fixes
* explicitly stop ecs before starting ebs autoscale on /var/lib/docker
* move nextflow additions to before start ecs
* add steps missing that are documented in https://docs.docker.com/storage/storagedriver/btrfs-driver/

Improvements
* Adding SSM agent and permissions to Batch hosts to allow SSM capabilities like Session Manager to facilitate troubleshooting via SSH without needing an EC2 keypair.
* refactor containers and job defs
* use host bind mounted awscli
* use job def environment variables for execution options
* use common entrypoint script for all containers
* update sfn example to use dynamic parallelism
* remove unneeded parameters from job definitions
* update example workflow input
* update build dependencies
* explicitly add pip
* unpin cfn-lint.  we need this to stay up to date.
* use common build script for tooling containers
* add container build template
* refactor step functions stack into separate templates
* create a generic workflow template that uses nested templates to build individual containers and the state machine for the workflow
* simplify the workflow definition templates - the container builds and IAM role creation happens in parent templates
* add UpdateReplacePolicy for S3 Buckets

Documentation Updates
* update nextflow documentation
  * fix a couple inconsistencies
  * improve flow and clarity
  * typo fixes
* update step functions docs
  * update images
  * add more details on job definition and sfn task
  * add more details on the example workflow
  * fix job output prefix in example input
  * update workflow completion time
  * add more detailed explanations of important job def parts and how they translate into sfn task code.
  • Loading branch information
wleepang authored Dec 21, 2019
1 parent 3218203 commit 708e176
Show file tree
Hide file tree
Showing 31 changed files with 1,318 additions and 769 deletions.
1 change: 1 addition & 0 deletions _scripts/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
set -e

# check cfn templates for errors
cfn-lint --version
cfn-lint src/templates/**/*.template.yaml

# make sure that site can build
Expand Down
28 changes: 21 additions & 7 deletions docs/orchestration/nextflow/nextflow-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ Nextflow can be run either locally or on a dedicated EC2 instance. The latter i

## Full Stack Deployment

The following CloudFormation template will launch an EC2 instance pre-configured for using Nextflow.
_For the impatient:_
The following CloudFormation template will create all the resources you need to runs Nextflow using the architecture shown above. It combines the CloudFormation stacks referenced below in the [Requirements](#requirements) section.

| Name | Description | Source | Launch Stack |
| -- | -- | :--: | -- |
Expand All @@ -20,14 +21,24 @@ When the above stack is complete, you will have a preconfigured Batch Job Defini

To get started using Nextflow on AWS you'll need the following setup in your AWS account:

* The core set of resources (S3 Bucket, IAM Roles, AWS Batch) described in the [Getting Started](../../../core-env/introduction) section.
* A containerized `nextflow` executable that pulls configuration and workflow definitions from S3
* The core set of resources (S3 Bucket, IAM Roles, AWS Batch) described in the [Core Environment](../../../core-env/introduction) section.

If you are in a hurry, you can create the complete Core Environment using the following CloudFormation template:

| Name | Description | Source | Launch Stack |
| -- | -- | :--: | :--: |
{{ cfn_stack_row("GWFCore (Existing VPC)", "GWFCore-Full", "aws-genomics-root-novpc.template.yaml", "Create EC2 Launch Templates, AWS Batch Job Queues and Compute Environments, a secure Amazon S3 bucket, and IAM policies and roles within an **existing** VPC. _NOTE: You must provide VPC ID, and subnet IDs_.") }}

!!! note
The CloudFormation above does **not** create a new VPC, and instead will create associated resources in an existing VPC of your choosing, or your default VPC. To automate creating a new VPC to isolate your resources, you can use the [AWS VPC QuickStart](https://aws.amazon.com/quickstart/architecture/vpc/).

* A containerized `nextflow` executable with a custom entrypoint script that draws configuration information from AWS Batch supplied environment variables
* The AWS CLI installed in job instances using `conda`
* A Batch Job Definition that runs a Nextflow head node
* An IAM Role for the Nextflow head node job that allows it access to AWS Batch
* (optional) An S3 Bucket to store your Nextflow workflow definitions
* An IAM Role for the Nextflow head node job that allows it to submit AWS Batch jobs
* (optional) An S3 Bucket to store your Nextflow session cache

The last five items above are created by the following CloudFormation template:
The five items above are created by the following CloudFormation template:

| Name | Description | Source | Launch Stack |
| -- | -- | :--: | -- |
Expand Down Expand Up @@ -181,6 +192,9 @@ chown -R ec2-user:ec2-user $USER/miniconda
rm Miniconda3-latest-Linux-x86_64.sh
```

!!! note
The actual Launch Template used in the [Core Environment](../../core-env/introduction.md) does a couple more things, like installing additional resources for [managing space for the job](../../core-env/create-custom-compute-resources.md)

### Batch job definition

An AWS Batch Job Definition for the containerized Nextflow described above is shown below.
Expand Down Expand Up @@ -374,7 +388,7 @@ You can customize these job definitions to incorporate additional environment va
!!! important
Instances provisioned using the Nextflow specific EC2 Launch Template configure `/var/lib/docker` in the host instance to use automatically [expandable scratch space](../../../core-env/create-custom-compute-resources/), allowing containerized jobs to stage as much data as needed without running into disk space limits.

### Running the workflow
### Running workflows

To run a workflow you submit a `nextflow` Batch job to the appropriate Batch Job Queue via:

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
175 changes: 141 additions & 34 deletions docs/orchestration/step-functions/step-functions-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ State machines that use AWS Batch for job execution and send events to CloudWatc
"Version": "2012-10-17",
"Statement": [
{
"Sid": "enable submitting batch jobs",
"Effect": "Allow",
"Action": [
"batch:SubmitJob",
Expand All @@ -64,9 +65,39 @@ State machines that use AWS Batch for job execution and send events to CloudWatc
}
```

For more complex workflows that use nested workflows or require more complex input parsing, you need to add additional permissions for executing Step Functions State Machines and invoking Lambda functions:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "enable calling lambda functions",
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction"
],
"Resource": "*"
},
{
"Sid": "enable calling other step functions",
"Effect": "Allow",
"Action": [
"states:StartExecution"
],
"Resource": "*"
},
...
]
}
```

!!! note
All `Resource` values in the policy statements above can be scoped to be more specific if needed.

## Step Functions State Machine

Workflows in AWS Step Functions are built using [Amazon States Language](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html) (ASL), a declarative, JSON-based, structured language used to define your state machine, a collection of states that can do work (Task states), determine which states to transition to next (Choice states), stop an execution with an error (Fail states), and so on.
Workflows in AWS Step Functions are built using [Amazon States Language](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html) (ASL), a declarative, JSON-based, structured language used to define a "state-machine". An AWS Step Functions State-Machine is a collection of states that can do work (Task states), determine which states to transition to next (Choice states), stop an execution with an error (Fail states), and so on.

### Building workflows with AWS Step Functions

Expand Down Expand Up @@ -123,9 +154,7 @@ Step Functions [ASL documentation](https://docs.aws.amazon.com/step-functions/la

### Batch Job Definitions

It is recommended to have [Batch Job Definitions](https://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html) created for your tooling prior to building a Step Functions state machine. These can then be referenced in state machine `Task` states by their respective ARNs.

Step Functions will use the Batch Job Definition to define compute resource requirements and parameter defaults for the Batch Job it submits.
[AWS Batch Job Definitions](https://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html) are used to define compute resource requirements and parameter defaults for an AWS Batch Job. These are then referenced in state machine `Task` states by their respective ARNs.

An example Job Definition for the `bwa-mem` sequence aligner is shown below:

Expand All @@ -134,58 +163,85 @@ An example Job Definition for the `bwa-mem` sequence aligner is shown below:
"jobDefinitionName": "bwa-mem",
"type": "container",
"parameters": {
"InputReferenceS3Prefix": "s3://<bucket-name>/reference",
"InputFastqS3Path1": "s3://<bucket-name>/<sample-name>/fastq/read1.fastq.gz",
"InputFastqS3Path2": "s3://<bucket-name>/<sample-name>/fastq/read2.fastq.gz",
"OutputS3Prefix": "s3://<bucket-name>/<sample-name>/aligned"
"threads": "8"
},
"containerProperties": {
"image": "<dockerhub-user>/bwa-mem:latest",
"vcpus": 8,
"memory": 32000,
"command": [
"Ref::InputReferenceS3Prefix",
"Ref::InputFastqS3Path1",
"Ref::InputFastqS3Path2",
"Ref::OutputS3Prefix",
"bwa", "mem",
"-t", "Ref::threads",
"-p",
"reference.fasta",
"sample_1.fastq.gz"
],
"volumes": [
{
"host": {
"sourcePath": "/scratch"
},
"name": "scratch"
},
{
"host": {
"sourcePath": "/opt/miniconda"
},
"name": "aws-cli"
}
],
"environment": [
{
"name": "REFERENCE_URI",
"value": "s3://<bucket-name>/reference/*"
},
{
"name": "INPUT_DATA_URI",
"value": "s3://<bucket-name>/<sample-name>/fastq/*.fastq.gz"
},
{
"name": "OUTPUT_DATA_URI",
"value": "s3://<bucket-name>/<sample-name>/aligned"
}
],
"environment": [],
"mountPoints": [
{
"containerPath": "/opt/work",
"sourceVolume": "scratch"
},
{
"containerPath": "/opt/miniconda",
"sourceVolume": "aws-cli"
}
],
"ulimits": []
}
}
```

!!! note
The Job Definition above assumes that `bwa-mem` has been containerized with an
`entrypoint` script that handles Amazon S3 URIs for input and output data
staging.
There are three key parts of the above definition to take note of.

Because data staging requirements can be unique to the tooling used, neither AWS Batch nor Step Functions handles this automatically.
* Command and Parameters

The **command** is a list of strings that will be sent to the container. This is the same as the `...` arguments that you would provide to a `docker run mycontainer ...` command.

**Parameters** are placeholders that you define whose values are substituted when a job is submitted. In the case above a `threads` parameter is defined with a default value of `8`. The job definition's `command` references this parameter with `Ref::threads`.

!!! note
Parameter references in the command list must be separate strings - concatenation with other parameter references or static values is not allowed.

* Environment

**Environment** defines a set of environment variables that will be available for the container. For example, you can define environment variables used by the container entrypoint script to identify data it needs to stage in.

* Volumes and Mount Points

Together, **volumes** and **mountPoints** define what you would provide as using a `-v hostpath:containerpath` option to a `docker run` command. These can be used to map host directories with resources (e.g. data or tools) used by all containers. In the example above, a `scratch` volume is mapped so that the container can utilize a larger disk on the host. Also, a version of the AWS CLI installed with `conda` is mapped into the container - enabling the container to have access to it (e.g. so it can transfer data from S3 and back) with out explicitly building in.

!!! note
The `volumes` and `mountPoints` specifications allow the job container to
use scratch storage space on the instance it is placed on. This is equivalent
to the `-v host_path:container_path` option provided to a `docker run` call
at the command line.

### State Machine Batch Job Tasks

Conveniently for genomics workflows, AWS Step Functions has built-in integration with AWS Batch (and [several other services](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-connectors.html)), and provides snippets of code to make developing your state-machine
Batch tasks easier.
AWS Step Functions has built-in integration with AWS Batch (and [several other services](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-connectors.html)), and provides snippets of code to make developing your state-machine tasks easier.

![Manage a Batch Job Snippet](images/sfn-batch-job-snippet.png)

Expand All @@ -202,7 +258,15 @@ would look like the following:
"JobDefinition": "arn:aws:batch:<region>:<account>:job-definition/bwa-mem:1",
"JobName": "bwa-mem",
"JobQueue": "<queue-arn>",
"Parameters.$": "$.bwa-mem.parameters"
"Parameters.$": "$.bwa-mem.parameters",
"Environment": [
{"Name": "REFERENCE_URI",
"Value.$": "$.bwa-mem.environment.REFERENCE_URI"},
{"Name": "INPUT_DATA_URI",
"Value.$": "$.bwa-mem.environment.INPUT_DATA_URI"},
{"Name": "OUTPUT_DATA_URI",
"Value.$": "$.bwa-mem.environment.OUTPUT_DATA_URI"}
]
},
"Next": "NEXT_TASK_NAME"
}
Expand All @@ -214,36 +278,79 @@ Inputs to a state machine that uses the above `BwaMemTask` would look like this:
{
"bwa-mem": {
"parameters": {
"InputReferenceS3Prefix": "s3://<bucket-name/><sample-name>/reference",
"InputFastqS3Path1": "s3://<bucket-name/><sample-name>/fastq/read1.fastq.gz",
"InputFastqS3Path2": "s3://<bucket-name/><sample-name>/fastq/read2.fastq.gz",
"OutputS3Prefix": "s3://<bucket-name/><sample-name>/aligned"
"threads": 8
},
"environment": {
"REFERENCE_URI": "s3://<bucket-name/><sample-name>/reference/*",
"INPUT_DATA_URI": "s3://<bucket-name/><sample-name>/fastq/*.fastq.gz",
"OUTPUT_DATA_URI": "s3://<bucket-name/><sample-name>/aligned"
}
},
...
}
}
```

When the Task state completes Step Functions will add information to a new `status` key under `bwa-mem` in the JSON object. The complete object will be passed on to the next state in the workflow.

## Example state machine

All of the above is created by the following CloudFormation template.
The following CloudFormation template creates container images, AWS Batch Job Definitions, and an AWS Step Functions State Machine for a simple genomics workflow using bwa, samtools, and bcftools.

| Name | Description | Source | Launch Stack |
| -- | -- | :--: | :--: |
{{ cfn_stack_row("AWS Step Functions Example", "SfnExample", "step-functions/sfn-example.template.yaml", "Create a Step Functions State Machine, Batch Job Definitions, and container images to run an example genomics workflow") }}
{{ cfn_stack_row("AWS Step Functions Example", "SfnExample", "step-functions/sfn-workflow.template.yaml", "Create a Step Functions State Machine, Batch Job Definitions, and container images to run an example genomics workflow") }}

!!! note
The stack above needs to create several IAM Roles. You must have administrative privileges in your AWS Account for this to succeed.

The example workflow is a simple secondary analysis pipeline that converts raw FASTQ files into VCFs with variants called for a list of chromosomes. It uses the following open source based tools:

* `bwa-mem`: Burrows-Wheeler Aligner for aligning short sequence reads to a reference genome
* `samtools`: **S**equence **A**lignment **M**apping library for indexing and sorting aligned reads
* `bcftools`: **B**inary (V)ariant **C**all **F**ormat library for determining variants in sample reads relative to a reference genome

Read alignment, sorting, and indexing occur sequentially by Step Functions Task States. Variant calls for chromosomes occur in parallel using a Step Functions Map State and sub-Task States therein. All tasks submit AWS Batch Jobs to perform computational work using containerized versions of the tools listed above.

![example genomics workflow state machine](./images/sfn-example-mapping-state-machine.png)

The tooling containers used by the workflow use a [generic entrypoint script]({{ repo_url + "tree/master/src/containers" }}) that wraps the underlying tool and handles S3 data staging. It uses the AWS CLI to transfer objects and environment variables to identify data inputs and outputs to stage.

### Running the workflow

When the stack above completes, go to the outputs tab and copy the JSON string provided in `StateMachineInput`.

![cloud formation output tab](./images/cfn-stack-outputs-tab.png)
![example state-machine input](./images/cfn-stack-outputs-statemachineinput.png)

The input JSON will like the following, but with the values for `queue` and `JOB_OUTPUT_PREFIX` prepopulated with resource names specific to the stack created by the CloudFormation template above:

```json
{
"params": {
"__comment__": {
"replace values for `queue` and `environment.JOB_OUTPUT_PREFIX` with values that match your resources": {
"queue": "Name or ARN of the AWS Batch Job Queue the workflow will use by default.",
"environment.JOB_OUTPUT_PREFIX": "S3 URI (e.g. s3://bucket/prefix) you are using for workflow inputs and outputs."
},
},
"queue": "default",
"environment": {
"REFERENCE_NAME": "Homo_sapiens_assembly38",
"SAMPLE_ID": "NIST7035",
"SOURCE_DATA_PREFIX": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq",
"JOB_OUTPUT_PREFIX": "s3://YOUR-BUCKET-NAME/PREFIX",
"JOB_AWS_CLI_PATH": "/opt/miniconda/bin"
},
"chromosomes": [
"chr19",
"chr20",
"chr21",
"chr22"
]
}
}
```

Next head to the AWS Step Functions console and select the state-machine that was created.

![select state-machine](./images/sfn-console-statemachine.png)
Expand All @@ -260,4 +367,4 @@ You will then be taken to the execution tracking page where you can monitor the

![execution tracking](./images/sfn-console-execution-inprogress.png)

The workflow takes approximately 5-6hrs to complete on `r4.2xlarge` SPOT instances.
The example workflow references a small demo dataset and takes approximately 20-30 minutes to complete.
3 changes: 2 additions & 1 deletion environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@ channels:
- defaults
dependencies:
- python=3.6.6
- pip
- pip:
- cfn-lint==0.16.0
- cfn-lint
- fontawesome-markdown==0.2.6
- mkdocs==1.0.4
- mkdocs-macros-plugin==0.2.4
Expand Down
6 changes: 6 additions & 0 deletions src/containers/_common/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Common assets for tooling containers

These are assets that are used to build all tooling containers.

* `build.sh`: a generic build script that first builds a base image for a container, then builds an AWS specific image
* `entrypoint.aws.sh`: a generic entrypoint script that wraps a call to a binary tool in the container with handlers data staging from/to S3
9 changes: 9 additions & 0 deletions src/containers/_common/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

IMAGE_NAME=$1

# build the base image
docker build -t $IMAGE_NAME .

# build the image with an AWS specific entrypoint
docker build -t $IMAGE_NAME -f aws.dockerfile .
Loading

0 comments on commit 708e176

Please sign in to comment.