v3.1.0 release (#192)

* move ecs configs to ecs-additions-common * fix nextflow workflow cancel behavior * increase docker / ecs stop timeout to allow nextflow cleanup activities * fix error handling in cleanup() in nextflow container entrypoint script * bump to cdk v1.102.0 * change examples source files * enable docker authentication on cromwell * change policy name * change policy * fix some typos and add policies * fix typo on IAM * add cromwell and gwf-core template * scope down policies * fix passing lists to nested stacks * everything is AL2 compatible now * change secret name and scope down permissions * updated efs * update to gp3 * updated EFS support in Nextflow * fixed yaml errors * updated efs deployment * add message about s3 bucket location * update nextflow mount script * support existing EFS * updated combine nextflow and core * update readme * updated with lint * fix pr comments * update ecs-config changes * updated null efs param to none * updated typos and missed string to number * add nextflow helper script * catch error if log not ready * Update README.md Updating README to say MIT-0 rather than Modified MIT now that MIT-0 is a standard SPDX tag. * Cromwell install docs * gwf-core auto update code pipeline * bump mkdocs version * update dependencies * updating with changes to integrate use of FSx and also made some changes to other params for more options. * updated with latest cromwell jar from Henrique * updated cromwell db timeout from 5 secs to 30 secs due to issues faced. Updated cromwell jar with changes recently made with caching added instance type options to config as per other PR raised last year. Would be helpful to have that option * Bump minimist from 1.2.5 to 1.2.6 in /src/aws-genomics-cdk Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6. - [Release notes](https://github.com/substack/minimist/releases) - [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6) --- updated-dependencies: - dependency-name: minimist dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * updated the Max VCPU's to 1000 as per Lee's recommendation Co-authored-by: Itzik Paz <[email protected]> Co-authored-by: Henrique Silva <[email protected]> Co-authored-by: Friedman <[email protected]> Co-authored-by: ajfriedman18 <[email protected]> Co-authored-by: Henri Yandell <[email protected]> Co-authored-by: Mark Schreiber <[email protected]> Co-authored-by: patsarth_gfb <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
aws-samples · Mar 30, 2022 · 72adffa · 72adffa
1 parent 1676f13
commit 72adffa
Show file tree

Hide file tree

Showing 49 changed files with 21,092 additions and 4,843 deletions.
diff --git a/README.md b/README.md
@@ -55,6 +55,15 @@ aws cloudformation create-stack \
 
 ```
 
+## Shared File System Support
+
+Amazon EFS is supported out of the box for `GWFCore` and `Nextflow`. You have two options to use EFS.
+
+1. **Create a new EFS File System:** Be sure to have `CreateEFS` set to `Yes` and also include the total number of subnets.
+2. **Use an Exisitng EFS File System:** Be sure to specify the EFS ID in the `ExistingEFS` parameter. This file system should be accessible from every subnet you specify.
+
+Following successful deployment of `GWFCore`, when creating your Nextflow Resources, set `MountEFS` to `Yes`.
+
 ## Building the documentation
 
 The documentation is built using mkdocs.
@@ -76,4 +85,4 @@ $ mkdocs build
 
 ## License Summary
 
-This sample code is made available under a modified MIT license. See the LICENSE file.
+This library is licensed under the MIT-0 License. See the LICENSE file.
diff --git a/docs/install-cromwell/images/screen1.png b/docs/install-cromwell/images/screen1.png
diff --git a/docs/install-cromwell/images/screen2.png b/docs/install-cromwell/images/screen2.png
diff --git a/docs/install-cromwell/images/screen3.png b/docs/install-cromwell/images/screen3.png
diff --git a/docs/install-cromwell/images/screen4.png b/docs/install-cromwell/images/screen4.png
diff --git a/docs/install-cromwell/images/screen5.png b/docs/install-cromwell/images/screen5.png
diff --git a/docs/install-cromwell/index.md b/docs/install-cromwell/index.md
@@ -0,0 +1,112 @@
+# Installing the Genomics Workflow Core and Cromwell
+
+## Summary
+
+The purpose of this document is to demonstrate how an AWS user can provision the infrastructure necessary to run Cromwell versions 52 and beyond on AWS Batch using S3 as an object store using CloudFormation. The instructions cover deployment into an existing VPC. There are two main steps: deploying the genomics workflow core infrastructure which can be used with Cromwell, Nextflow and AWS Step Functions, and the deployment of the Cromwell server and related artifacts.
+* * *
+
+## Assumptions
+
+1. The instructions assume you have an existing AWS account with sufficient credentials to deploy the infrastructure or that you will use a role with CloudFormation that has sufficient privileges (admin role is recommended).
+2. You have an existing VPC to deploy artifacts into. This VPC should have a minimum of two subnets with routes to the public internet. Private subnet routes may be through a NAT Gateway.
+
+* * *
+
+## Deployment of Genomics Workflow Core into an existing VPC.
+
+Take note of the id of the VPC that you will use and the ids of the subnets of the VPC that you will use for the Batch worker nodes. We recommend using two or more private subnets. 
+
+1. Open the CloudFormation consoles and select “**Create stack**” with new resources. Enter `https://aws-genomics-workflows.s3.amazonaws.com/latest/templates/gwfcore/gwfcore-root.template.yaml` as the Amazon S3 URL.
+
+![](./images/screen1.png)
+
+1. Select appropriate values for your environment including the VPC and subnets you recorded above. It is recommended to leave the Default and High Priority Min vCPU values at 0 so that the AWS Batch cluster will not have any instances running when there are no workflows running. Max vCPU values may be increased if you expect to run large workloads utilizing many CPUs. Leave the Distribution Configuration values with the preset defaults.
+
+![](./images/screen2.png)
+
+1. Optionally add tags and click **Next**
+
+1. Review the parameters, acknowledge the Capabilities notifications and click “**Create Stack**”
+
+![](./images/screen5.png)
+
+The template will now create several nested stacks to deploy the required resources. This step will take approximately 10 minutes to complete. When this is complete you can proceed with the “[Deploy Cromwell Resources](#deploy-cromwell-resources)” section below.
+* * *
+
+## Deploy Cromwell Resources
+
+1. Ensure all steps of the CloudFormation deployment of the Genomics Workflow Core have successfully completed before proceeding any further.
+2. From the CloudFormation console select “**Create Stack**” and if prompted select “**With new resources (Standard)**”
+3. Fill in the Amazon S3 URL with `https://aws-genomics-workflows.s3.amazonaws.com/latest/templates/cromwell/cromwell-resources.template.yaml`
+
+![](./images/screen3.png)
+
+4. Fill in appropriate values for the template. **For `GWFCoreNamespace` use the names space value you used in the section above****.** You should use the same VPC as you used in the previous step above. To secure your Cromwell server you should change the `SSH Address Range` and `HTTP Address Range` to trusted values, these will be used when creating the servers security group. 
+5. You may either use the latest version of Cromwell (recommended) or specify a version **52 or greater.**
+6. Select a MySQL compliant `Cromwell Database Password` that will be used for Cromwell’s metadata database. Select “**Next”**.
+
+![](./images/screen4.png)
+
+7. On the remaining two screens keep the defaults, acknowledge the IAM capabilities and then click “**Create Stack**”
+
+Once the stack completes an EC2 will be deployed and it will be running an instance of the Cromwell server. You can now proceed with "[Testing your deployment](#testing-your-deployment)"
+* * *
+
+## Testing your Deployment
+
+The following WDL file is a very simple workflow that can be used to test that all the components of the deployment are working together. Add the code block below to a file named `workflow.wdl`
+
+```
+workflow helloWorld {
+    call sayHello
+}
+
+task sayHello {
+    command {
+        echo "hello world"
+    }
+    output {
+        String out = read_string(stdout())
+    }
+
+    runtime {
+       docker: "ubuntu:latest"
+       memory: "1 GB"
+       cpu: 1
+    }
+}
+```
+
+This task can be submitted to the servers REST endpoint using `curl` either from a client that has access to the servers elastic IP or from within the server itself using `localhost.` The hostname of the server is also emitted as an output from the cromwell-resources CloudFormation template.
+
+```
+curl -X POST "http://localhost:8000/api/workflows/v1" \
+     -H "accept: application/json" \
+     -F "[email protected]"
+```
+
+It can take a few minutes for AWS Batch to realize there is a job in the work queue and provision a worker to run it. You can monitor this in the AWS Batch console. 
+
+You can also monitor the Cromwell server logs in CloudWatch. There will be a log group called `cromwell-server.` Once the run is completed you will see output similar to:
+
+![](./images/screen5.png)
+
+If the run is successful subsequent runs will be “call cached” meaning that the results of the previous run will be copied for all successful steps. If you resubmit the job you will very quickly see the workflow success in the server logs and no additional jobs will be seen in the AWS Batch console. You can disable call caching for the job by adding an options file and submitting it with the run. This will cause the workflow to be re-executed in full.
+
+```json5
+{
+    "write_to_cache": false,
+    "read_from_cache": false
+}
+```
+
+```shell
+curl -X POST "http://localhost:8000/api/workflows/v1" \
+     -H "accept: application/json" \
+     -F "[email protected]" \
+     -F "[email protected]"
+```
+
+For a more realistic workflow, a WDL for simple variant calling using bwa-mem, samtools, and bcftools is available [here](https://github.com/wleepang/demo-genomics-workflow-wdl):
+
+Clone the repo, and submit the WDL file to cromwell. The workflow uses default inputs from public data sources. If you want to override these inputs, modify the `inputs.json` file accordingly and submit it along with the workflow.
diff --git a/requirements.txt b/requirements.txt
@@ -1,7 +1,7 @@
-mkdocs==1.0.4
+mkdocs==1.2.3
 mkdocs-macros-plugin==0.2.4
 mkdocs-markdownextradata-plugin==0.0.5
 mkdocs-material==3.1.0
 pymdown-extensions==6.0
 fontawesome-markdown==0.2.6
-cfn-lint==0.16.0
+cfn-lint==0.16.0
diff --git a/src/aws-genomics-cdk/examples/batch-bwa-job.json b/src/aws-genomics-cdk/examples/batch-bwa-job.json
@@ -3,14 +3,15 @@
     "jobQueue": "genomics-default-queue",
     "jobDefinition": "bwa:1",
     "containerOverrides": {
-        "command": ["bwa mem -t 8 -p -o ${SAMPLE_ID}.sam ${REFERENCE_NAME}.fasta ${SAMPLE_ID}_*1*.fastq.gz"],
+        "command": ["bwa mem -t 8 -p -o ${SAMPLE_ID}.sam ${REFERENCE_NAME}.fasta ${SAMPLE_ID}_1*.fastq.gz"],
+        "memory": 32000,
         "environment": [{
                 "name": "JOB_INPUTS",
-                "value": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq/NIST7035* s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta*"
+                "value": "s3://1000genomes/pilot_data/data/NA12878/pilot3_unrecal/SRR014820_*.fastq.gz s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta*"
             },
             {
                 "name": "SAMPLE_ID",
-                "value": "NIST7035"
+                "value": "SRR014820"
             },
             {
                 "name": "REFERENCE_NAME",

diff --git a/src/aws-genomics-cdk/examples/batch-fastqc-job.json b/src/aws-genomics-cdk/examples/batch-fastqc-job.json
@@ -6,7 +6,7 @@
         "command": ["fastqc *.gz"],
         "environment": [{
                 "name": "JOB_INPUTS",
-                "value": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq/NIST7035_R*.fastq.gz"
+                "value": "s3://1000genomes/pilot_data/data/NA12878/pilot3_unrecal/SRR014820_*.fastq.gz"
             },
             {
                 "name": "JOB_OUTPUTS",

diff --git a/src/aws-genomics-cdk/examples/batch-gatk-htc.json b/src/aws-genomics-cdk/examples/batch-gatk-htc.json
@@ -10,7 +10,7 @@
             },
             {
                 "name": "SAMPLE_ID",
-                "value": "NIST7035"
+                "value": "SRR014820"
             },
             {
                 "name": "REFERENCE_NAME",

diff --git a/src/aws-genomics-cdk/examples/batch-minimap2-job.json b/src/aws-genomics-cdk/examples/batch-minimap2-job.json
@@ -4,11 +4,11 @@
     "jobDefinition": "minimap2:1",
     "containerOverrides": {
         "vcpus": 8,
-        "memory": 16000,
-        "command": ["minimap2 -ax map-pb Homo_sapiens_assembly38.fasta NIST7035_R1_trim_samp-0p1.fastq.gz > NIST7035.sam"],
+        "memory": 32000,
+        "command": ["minimap2 -ax map-pb Homo_sapiens_assembly38.fasta SRR014820_1.fastq.gz > SRR014820.sam"],
         "environment": [{
                 "name": "JOB_INPUTS",
-                "value": "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq/NIST7035_R1_trim_samp-0p1.fastq.gz"
+                "value": "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta s3://1000genomes/pilot_data/data/NA12878/pilot3_unrecal/SRR014820_1.fastq.gz"
             },
             {
                 "name": "JOB_OUTPUTS",

diff --git a/src/aws-genomics-cdk/examples/batch-picard-add-missing-groups.json b/src/aws-genomics-cdk/examples/batch-picard-add-missing-groups.json
@@ -10,11 +10,11 @@
         ],
         "environment": [{
                 "name": "JOB_INPUTS",
-                "value": "s3://YOUR-BUCKET-NAME/samples/NIST7035.bam"
+                "value": "s3://YOUR-BUCKET-NAME/samples/SRR014820.bam"
             },
             {
                 "name": "SAMPLE_ID",
-                "value": "NIST7035"
+                "value": "SRR014820"
             },
             {
                 "name": "JOB_OUTPUTS",

diff --git a/src/aws-genomics-cdk/examples/batch-samtools-index.json b/src/aws-genomics-cdk/examples/batch-samtools-index.json
@@ -6,11 +6,11 @@
         "command": ["samtools index ${SAMPLE_ID}.bam"],
         "environment": [{
                 "name": "JOB_INPUTS",
-                "value": "s3://YOUR-BUCKET-NAME/samples/NIST7035.bam"
+                "value": "s3://YOUR-BUCKET-NAME/samples/SRR014820.bam"
             },
             {
                 "name": "SAMPLE_ID",
-                "value": "NIST7035"
+                "value": "SRR014820"
             },
             {
                 "name": "JOB_OUTPUTS",

diff --git a/src/aws-genomics-cdk/examples/batch-samtools-sort.json b/src/aws-genomics-cdk/examples/batch-samtools-sort.json
@@ -6,11 +6,11 @@
         "command": ["samtools sort -@ 4 -o ${SAMPLE_ID}.bam ${SAMPLE_ID}.sam"],
         "environment": [{
                 "name": "JOB_INPUTS",
-                "value": "s3://YOUR-BUCKET-NAME/samples/NIST7035.sam"
+                "value": "s3://YOUR-BUCKET-NAME/samples/SRR014820.sam"
             },
             {
                 "name": "SAMPLE_ID",
-                "value": "NIST7035"
+                "value": "SRR014820"
             },
             {
                 "name": "JOB_OUTPUTS",

diff --git a/src/aws-genomics-cdk/lib/workflows/genomics-task-construct.ts b/src/aws-genomics-cdk/lib/workflows/genomics-task-construct.ts
@@ -40,8 +40,8 @@ export default class GenomicsTask extends cdk.Construct {
         };
         const taskProps = {
             jobName: props.taskName,
-            jobDefinition: props.jobDefinition,
-            jobQueue: props.queue,
+            jobDefinitionArn: props.jobDefinition.jobDefinitionArn,
+            jobQueueArn: props.queue.jobQueueArn,
             containerOverrides: taskContainerProps,
             inputPath: "$",
             resultPath: "$.result"

diff --git a/src/aws-genomics-cdk/lib/workflows/variant-calling-stack.ts b/src/aws-genomics-cdk/lib/workflows/variant-calling-stack.ts
@@ -78,7 +78,7 @@ export default class VariantCallingStateMachine extends cdk.Stack {
       ...defaultJobDefinitionProps,
       repository: "genomics/bwa",
       timeout: 600,
-      memoryLimit: 8000,
+      memoryLimit: 32000,
       vcpus: 8,
     });