Skip to content
This repository has been archived by the owner on Jan 5, 2021. It is now read-only.

Commit

Permalink
1.4update (#4)
Browse files Browse the repository at this point in the history
* Updating workkflow to v1.4 from dsde-piplines

* Added larger inputs and removed 'disable_sanity_check' paramter from json

* minor wording edits
  • Loading branch information
bshifaw authored Apr 23, 2020
1 parent 64d2948 commit a0a49d2
Show file tree
Hide file tree
Showing 11 changed files with 70 additions and 75 deletions.
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,15 @@ Indel discovery in human whole-genome sequencing data.
- Does not work on versions < v23 due to output syntax

### Important Notes :
- The provided JSON is meant to be a ready to use example JSON template of the workflow. It is the user’s responsibility to correctly set the reference and resource input variables using the [GATK Tool and Tutorial Documentations](https://gatk.broadinstitute.org/hc/en-us/categories/360002310591).
- Relevant reference and resources bundles can be accessed in [Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360036212652).
- The provided JSON is a generic ready to use example template for the workflow. It is the user’s responsibility to correctly set the reference and resource variables for their own particular test case using the [GATK Tool and Tutorial Documentations](https://gatk.broadinstitute.org/hc/en-us/categories/360002310591).
- Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
- For help running workflows on the Google Cloud Platform or locally please
view the following tutorial [(How to) Execute Workflows from the gatk-workflows Git Organization](https://gatk.broadinstitute.org/hc/en-us/articles/360035530952).
- The following material is provided by the GATK Team. Please post any questions or concerns to one of our forum sites : [GATK](https://gatk.broadinstitute.org/hc/en-us/community/topics) and [Terra](https://broadinstitute.zendesk.com/hc/en-us/community/topics/360000500432-General-Discussion).
- Please visit the [User Guide](https://gatk.broadinstitute.org/hc/en-us) site for further documentation on our workflows and tools.
- Please visit the [User Guide](https://gatk.broadinstitute.org/hc/en-us/categories/360002310591) site for further documentation on our workflows and tools.
- Relevant reference and resources bundles can be accessed in [Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360036212652).

### Contact Us :
- The following material is provided by the Data Science Platforum group at the Broad Institute. Please direct any questions or concerns to one of our forum sites : [GATK](https://gatk.broadinstitute.org/hc/en-us/community/topics) or [Terra](https://support.terra.bio/hc/en-us/community/topics/360000500432).

### LICENSING :
Copyright Broad Institute, 2019 | BSD-3
Expand Down
36 changes: 28 additions & 8 deletions WholeGenomeGermlineSingleSample.inputs.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,34 @@
{
"WholeGenomeGermlineSingleSample.sample_and_unmapped_bams": {
"sample_name": "NA12878_PLUMBING",
"base_file_name": "NA12878_PLUMBING",
"sample_name": "NA12878",
"base_file_name": "NA1278",
"flowcell_unmapped_bams": [
"gs://dsde-data-na12878-public/H06HDADXX130110.1.ATCACGAT.20k_reads.bam",
"gs://dsde-data-na12878-public/H06HDADXX130110.2.ATCACGAT.20k_reads.bam",
"gs://dsde-data-na12878-public/H06JUADXX130110.1.ATCACGAT.20k_reads.bam"
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HJYFJ.5.NA12878.downsampled.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HJYFJ.6.NA12878.downsampled.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HJYFJ.7.NA12878.downsampled.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HJYFJ.8.NA12878.downsampled.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HJYN2.1.NA12878.downsampled.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35M.1.NA12878.downsampled.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35M.2.NA12878.downsampled.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35M.3.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35M.4.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35M.5.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35M.6.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35M.7.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35M.8.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35N.1.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK35N.2.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK3T5.1.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK3T5.2.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK3T5.3.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK3T5.4.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK3T5.5.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK3T5.6.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK3T5.7.NA12878.interval.filtered.query.sorted.unmapped.bam",
"gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HK3T5.8.NA12878.interval.filtered.query.sorted.unmapped.bam"
],
"final_gvcf_base_name": "NA12878_PLUMBING",
"final_gvcf_base_name": "NA12878",
"unmapped_bam_suffix": ".bam"
},

Expand Down Expand Up @@ -50,6 +71,5 @@
"WholeGenomeGermlineSingleSample.papi_settings": {
"preemptible_tries": 3,
"agg_preemptible_tries": 3
},
"WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.UnmappedBamToAlignedBam.CheckContamination.disable_sanity_check": true
}
}
21 changes: 7 additions & 14 deletions WholeGenomeGermlineSingleSample.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -28,24 +28,17 @@ version 1.0
## page at https://hub.docker.com/r/broadinstitute/genomes-in-the-cloud/ for detailed
## licensing information pertaining to the included programs.
#import "./tasks/UnmappedBamToAlignedBam.wdl" as ToBam
#import "./tasks/AggregatedBamQC.wdl" as AggregatedQC
#import "./tasks/Qc.wdl" as QC
#import "./tasks/BamToCram.wdl" as ToCram
#import "./tasks/VariantCalling.wdl" as ToGvcf
#import "./structs/GermlineStructs.wdl"
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/UnmappedBamToAlignedBam.wdl" as ToBam
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/AggregatedBamQC.wdl" as AggregatedQC
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Qc.wdl" as QC
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/BamToCram.wdl" as ToCram
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/VariantCalling.wdl" as ToGvcf
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/structs/GermlineStructs.wdl"
import "./tasks/UnmappedBamToAlignedBam.wdl" as ToBam
import "./tasks/AggregatedBamQC.wdl" as AggregatedQC
import "./tasks/Qc.wdl" as QC
import "./tasks/BamToCram.wdl" as ToCram
import "./tasks/VariantCalling.wdl" as ToGvcf
import "./structs/GermlineStructs.wdl"

# WORKFLOW DEFINITION
workflow WholeGenomeGermlineSingleSample {

String pipeline_version = "1.3"
String pipeline_version = "1.4"

input {
SampleAndUnmappedBams sample_and_unmapped_bams
Expand Down
7 changes: 2 additions & 5 deletions tasks/AggregatedBamQC.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,8 @@ version 1.0
## page at https://hub.docker.com/r/broadinstitute/genomes-in-the-cloud/ for detailed
## licensing information pertaining to the included programs.
#import "./Qc.wdl" as QC
#import "../structs/GermlineStructs.wdl"
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Qc.wdl" as QC
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/structs/GermlineStructs.wdl"
import "./Qc.wdl" as QC
import "../structs/GermlineStructs.wdl"

# WORKFLOW DEFINITION
workflow AggregatedBamQC {
Expand Down
4 changes: 1 addition & 3 deletions tasks/Alignment.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,7 @@ version 1.0
## page at https://hub.docker.com/r/broadinstitute/genomes-in-the-cloud/ for detailed
## licensing information pertaining to the included programs.
#import "../structs/GermlineStructs.wdl"
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/structs/GermlineStructs.wdl"
import "../structs/GermlineStructs.wdl"

# Get version of BWA
task GetBwaVersion {
Expand Down
6 changes: 4 additions & 2 deletions tasks/BamProcessing.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -107,12 +107,13 @@ task MarkDuplicates {
# This can be desirable if you don't mind the estimated library size being wrong and optical duplicate detection is taking >7 days and failing
String? read_name_regex
Int memory_multiplier = 1
Int additional_disk = 20
}

# The merged bam will be smaller than the sum of the parts so we need to account for the unmerged inputs and the merged output.
# Mark Duplicates takes in as input readgroup bams and outputs a slightly smaller aggregated bam. Giving .25 as wiggleroom
Float md_disk_multiplier = 3
Int disk_size = ceil(md_disk_multiplier * total_input_size) + 20
Int disk_size = ceil(md_disk_multiplier * total_input_size) + additional_disk

Float memory_size = 7.5 * memory_multiplier
Int java_memory_size = (ceil(memory_size) - 2)
Expand Down Expand Up @@ -271,10 +272,11 @@ task ApplyBQSR {
Int preemptible_tries
String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.0.10.1"
Int memory_multiplier = 1
Int additional_disk = 20
}

Float ref_size = size(ref_fasta, "GiB") + size(ref_fasta_index, "GiB") + size(ref_dict, "GiB")
Int disk_size = ceil((size(input_bam, "GiB") * 3 / bqsr_scatter) + ref_size) + 20
Int disk_size = ceil((size(input_bam, "GiB") * 3 / bqsr_scatter) + ref_size) + additional_disk

Int memory_size = ceil(3500 * memory_multiplier)

Expand Down
7 changes: 2 additions & 5 deletions tasks/BamToCram.wdl
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
version 1.0

#import "./Utilities.wdl" as Utils
#import "./Qc.wdl" as QC
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Utilities.wdl" as Utils
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Qc.wdl" as QC
import "./Utilities.wdl" as Utils
import "./Qc.wdl" as QC

workflow BamToCram {

Expand Down
9 changes: 6 additions & 3 deletions tasks/Qc.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -344,10 +344,11 @@ task ValidateSamFile {
Boolean? is_outlier_data
Int preemptible_tries
Int memory_multiplier = 1
Int additional_disk = 20
}
Float ref_size = size(ref_fasta, "GiB") + size(ref_fasta_index, "GiB") + size(ref_dict, "GiB")
Int disk_size = ceil(size(input_bam, "GiB") + ref_size) + 20
Int disk_size = ceil(size(input_bam, "GiB") + ref_size) + additional_disk
Int memory_size = ceil(7 * memory_multiplier)
Int java_memory_size = (memory_size - 1) * 1000
Expand Down Expand Up @@ -426,10 +427,11 @@ task CollectRawWgsMetrics {
Int read_length
Int preemptible_tries
Int memory_multiplier = 1
Int additional_disk = 20
}
Float ref_size = size(ref_fasta, "GiB") + size(ref_fasta_index, "GiB")
Int disk_size = ceil(size(input_bam, "GiB") + ref_size) + 20
Int disk_size = ceil(size(input_bam, "GiB") + ref_size) + additional_disk
Int memory_size = ceil((if (disk_size < 110) then 5 else 7) * memory_multiplier)
String java_memory_size = (memory_size - 1) * 1000
Expand Down Expand Up @@ -468,10 +470,11 @@ task CollectHsMetrics {
File bait_interval_list
Int preemptible_tries
Int memory_multiplier = 1
Int additional_disk = 20
}
Float ref_size = size(ref_fasta, "GiB") + size(ref_fasta_index, "GiB")
Int disk_size = ceil(size(input_bam, "GiB") + ref_size) + 20
Int disk_size = ceil(size(input_bam, "GiB") + ref_size) + additional_disk
# Try to fit the input bam into memory, within reason.
Int rounded_bam_size = ceil(size(input_bam, "GiB") + 0.5)
Int rounded_memory_size = ceil((if (rounded_bam_size > 10) then 10 else rounded_bam_size) * memory_multiplier)
Expand Down
13 changes: 4 additions & 9 deletions tasks/SplitLargeReadGroup.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,10 @@ version 1.0
## page at https://hub.docker.com/r/broadinstitute/genomes-in-the-cloud/ for detailed
## licensing information pertaining to the included programs.
#import "./Alignment.wdl" as Alignment
#import "./BamProcessing.wdl" as Processing
#import "./Utilities.wdl" as Utils
#import "../structs/GermlineStructs.wdl" as Structs
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Alignment.wdl" as Alignment
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/BamProcessing.wdl" as Processing
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Utilities.wdl" as Utils
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/structs/GermlineStructs.wdl" as Structs
import "./Alignment.wdl" as Alignment
import "./BamProcessing.wdl" as Processing
import "./Utilities.wdl" as Utils
import "../structs/GermlineStructs.wdl" as Structs

workflow SplitLargeReadGroup {

Expand Down
19 changes: 6 additions & 13 deletions tasks/UnmappedBamToAlignedBam.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,12 @@ version 1.0
## page at https://hub.docker.com/r/broadinstitute/genomes-in-the-cloud/ for detailed
## licensing information pertaining to the included programs.
#import "./Alignment.wdl" as Alignment
#import "./SplitLargeReadGroup.wdl" as SplitRG
#import "./Qc.wdl" as QC
#import "./BamProcessing.wdl" as Processing
#import "./Utilities.wdl" as Utils
#import "../structs/GermlineStructs.wdl" as Structs
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Alignment.wdl" as Alignment
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/SplitLargeReadGroup.wdl" as SplitRG
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Qc.wdl" as QC
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/BamProcessing.wdl" as Processing
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Utilities.wdl" as Utils
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/structs/GermlineStructs.wdl" as Structs
import "./Alignment.wdl" as Alignment
import "./SplitLargeReadGroup.wdl" as SplitRG
import "./Qc.wdl" as QC
import "./BamProcessing.wdl" as Processing
import "./Utilities.wdl" as Utils
import "../structs/GermlineStructs.wdl" as Structs

# WORKFLOW DEFINITION
workflow UnmappedBamToAlignedBam {
Expand Down
13 changes: 4 additions & 9 deletions tasks/VariantCalling.wdl
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
version 1.0

#import "./GermlineVariantDiscovery.wdl" as Calling
#import "./Qc.wdl" as QC
#import "./Utilities.wdl" as Utils
#import "./BamProcessing.wdl" as BamProcessing
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/GermlineVariantDiscovery.wdl" as Calling
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Qc.wdl" as QC
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/Utilities.wdl" as Utils
import "https://raw.githubusercontent.com/gatk-workflows/gatk4-genome-processing-pipeline/1.0.0/tasks/BamProcessing.wdl" as BamProcessing
import "./GermlineVariantDiscovery.wdl" as Calling
import "./Qc.wdl" as QC
import "./Utilities.wdl" as Utils
import "./BamProcessing.wdl" as BamProcessing

workflow VariantCalling {

Expand Down

0 comments on commit a0a49d2

Please sign in to comment.