Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TheiaCov] wfs add percentage_mapped_reads #641

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c64b57b
Added percentage_mapped_reads output to ivar_consensus.wdl and update…
fraser-combe Sep 9, 2024
93e2d89
update mapped reads trying read_float
fraser-combe Sep 10, 2024
5298eff
get read numbers from stats file
fraser-combe Sep 10, 2024
85a8cc8
get read numbers from stats filev2
fraser-combe Sep 10, 2024
9c2cb18
change from bc to awk for calculation
fraser-combe Sep 10, 2024
c57edc4
update awk
fraser-combe Sep 10, 2024
76b62ce
metric output txt instead of csv
fraser-combe Sep 10, 2024
e6afcdd
reswitchack to read_string output t
fraser-combe Sep 10, 2024
dba965b
percentage mapped reads based on trimmed bam file theiacov_ont
fraser-combe Sep 16, 2024
dd7b3e7
update theiacov-ont for mapped reads
fraser-combe Sep 17, 2024
b803c7a
Merge remote-tracking branch 'origin/main' into fc-theiacov-mapped-re…
fraser-combe Oct 2, 2024
f6d5393
pass output ivar cons mapped reads to wf for terra output
fraser-combe Oct 2, 2024
f888c65
perc mapped reads output flu track PE, ONT and clearlabs and doc update
fraser-combe Oct 3, 2024
92733ea
updated namings outputs cov_ONT and removed extra call assembly metrics
fraser-combe Oct 3, 2024
72db285
change naming output stat n coverage task
fraser-combe Oct 3, 2024
69b4666
update flu mapped reads perc variable name
fraser-combe Oct 3, 2024
b9320d7
make theiacov_ont conditional output flu mapped reads
fraser-combe Oct 3, 2024
0aeb272
wdl does not support if cond in output change to select first
fraser-combe Oct 3, 2024
4846b1e
wdl does not support if cond in output change to select first
fraser-combe Oct 3, 2024
33ab026
combine flu and non flu into same mapped reads output
fraser-combe Oct 3, 2024
7070c62
correct assembled reads call
fraser-combe Oct 21, 2024
e23b19a
update mdsums
fraser-combe Oct 21, 2024
b790973
update clearlabs for statncov call
fraser-combe Oct 21, 2024
58ac2ee
float?
fraser-combe Oct 21, 2024
45568de
mdsums and pe wf update flu irma defined
fraser-combe Oct 21, 2024
5454111
mdsum pe
fraser-combe Oct 21, 2024
6493640
move to flue track
fraser-combe Oct 22, 2024
56a4a30
tidy output pe and ont
fraser-combe Oct 22, 2024
108612b
update strings and provide default values
fraser-combe Oct 24, 2024
842691a
clean tab/spaces echo
fraser-combe Oct 31, 2024
cd32e74
my fav commit mdsums!
fraser-combe Oct 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/workflows/genomic_characterization/theiacov.md
Original file line number Diff line number Diff line change
Expand Up @@ -1104,6 +1104,7 @@ All TheiaCoV Workflows (not TheiaCoV_FASTA_Batch)
| pangolin_notes | String | Lineage notes as determined by Pangolin | CL, FASTA, ONT, PE, SE |
| pangolin_versions | String | All Pangolin software and database versions | CL, FASTA, ONT, PE, SE |
| percent_reference_coverage | Float | Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100 | CL, FASTA, ONT, PE, SE |
| percentage_mapped_reads | String |Percentage of reads that successfully aligned to the reference genome. This value is calculated by number of mapped reads / total number of reads x 100. | ONT, PE, SE |
| primer_bed_name | String | Name of the primer bed files used for primer trimming | CL, ONT, PE, SE |
| primer_trimmed_read_percent | Float | Percentage of read data with primers trimmed as determined by iVar trim | PE, SE |
| qc_check | String | The results of the QC Check task | CL, FASTA, ONT, PE, SE |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ task stats_n_coverage {
samtools --version | head -n1 | tee VERSION

samtools stats ~{bamfile} > ~{samplename}.stats.txt

samtools coverage ~{bamfile} -m -o ~{samplename}.cov.hist
samtools coverage ~{bamfile} -o ~{samplename}.cov.txt
samtools flagstat ~{bamfile} > ~{samplename}.flagstat.txt

# Extracting coverage, depth, meanbaseq, and meanmapq
coverage=$(cut -f 6 ~{samplename}.cov.txt | tail -n 1)
depth=$(cut -f 7 ~{samplename}.cov.txt | tail -n 1)
meanbaseq=$(cut -f 8 ~{samplename}.cov.txt | tail -n 1)
Expand All @@ -33,6 +33,34 @@ task stats_n_coverage {
echo $depth | tee DEPTH
echo $meanbaseq | tee MEANBASEQ
echo $meanmapq | tee MEANMAPQ

# Parsing stats.txt for total and mapped reads
total_reads=$(grep "^SN" ~{samplename}.stats.txt | grep "raw total sequences:" | cut -f 3)
mapped_reads=$(grep "^SN" ~{samplename}.stats.txt | grep "reads mapped:" | cut -f 3)

# Check for empty values and set defaults to avoid errors
if [ -z "$total_reads" ]; then total_reads="1"; fi # Avoid division by zero
if [ -z "$mapped_reads" ]; then mapped_reads="0"; fi

# Calculate the percentage of mapped reads
percentage_mapped_reads=$(awk "BEGIN {printf \"%.2f\", ($mapped_reads / $total_reads) * 100}")

# If the percentage calculation fails, default to 0.0
if [ -z "$percentage_mapped_reads" ]; then percentage_mapped_reads="0.0"; fi

# Output the result
echo $percentage_mapped_reads | tee PERCENTAGE_MAPPED_READS

#output all metrics in one txt file
# Output header row (for CSV)
echo "Statistic Value" > ~{samplename}_metrics.txt
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved

# Output each statistic as a row
echo "Coverage $coverage" >> ~{samplename}_metrics.txt
echo "Depth $depth" >> ~{samplename}_metrics.txt
echo "Mean Base Quality $meanbaseq" >> ~{samplename}_metrics.txt
echo "Mean Mapping Quality $meanmapq" >> ~{samplename}_metrics.txt
echo "Percentage Mapped Reads $percentage_mapped_reads" >> ~{samplename}_metrics.txt
>>>
output {
String date = read_string("DATE")
Expand All @@ -45,6 +73,9 @@ task stats_n_coverage {
Float depth = read_string("DEPTH")
Float meanbaseq = read_string("MEANBASEQ")
Float meanmapq = read_string("MEANMAPQ")
Float percentage_mapped_reads = read_string("PERCENTAGE_MAPPED_READS")
File metrics_txt = "~{samplename}_metrics.txt"

}
runtime {
docker: docker
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiacov/test_wf_theiacov_clearlabs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@
- path: miniwdl_run/call-pangolin4/work/clearlabs.pangolin_report.csv
md5sum: 151390c419b00ca44eb83e2bbfb96996
- path: miniwdl_run/call-stats_n_coverage/command
md5sum: 51da320ddc7de2ffeb263f0ddd85ced6
md5sum: 976ccf8be4e09ec5df581c834b5d5792
- path: miniwdl_run/call-stats_n_coverage/inputs.json
contains: ["bamfile", "samplename"]
- path: miniwdl_run/call-stats_n_coverage/outputs.json
Expand Down Expand Up @@ -350,7 +350,7 @@
- path: miniwdl_run/call-stats_n_coverage/work/clearlabs.stats.txt
md5sum: bfed5344c91ce6f4db1f688cac0a3ab9
- path: miniwdl_run/call-stats_n_coverage_primtrim/command
md5sum: a84f90b8877babe54bf8c068d244fbe8
md5sum: 5b211043f0514f94bbfd7a282d50d7c4
- path: miniwdl_run/call-stats_n_coverage_primtrim/inputs.json
contains: ["bamfile", "samplename"]
- path: miniwdl_run/call-stats_n_coverage_primtrim/outputs.json
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiacov/test_wf_theiacov_illumina_pe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@
md5sum: 511e696afe25f8b96a84d68ec5a8af8a
# stats n coverage primer trim
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage_primtrim/command
md5sum: 260c3887be6d99b18caf6d3914c5737f
md5sum: eb1b72ffa6068223720fe4afd745cdf9
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage_primtrim/inputs.json
contains: ["bamfile", "samplename"]
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage_primtrim/outputs.json
Expand Down Expand Up @@ -288,7 +288,7 @@
md5sum: 6c63395a125f8618334b8af2de4e2d88
# stats n coverage
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage/command
md5sum: 1ffac4cc3e9bdd84a0f9228e8e5ca5d9
md5sum: 6b32ff212b080a57198d3f1b464bddda
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage/inputs.json
contains: ["bamfile", "samplename"]
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage/outputs.json
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiacov/test_wf_theiacov_illumina_se.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@
md5sum: 603c3cbc771ca910b96d3c032aafe7c9
# stats n coverage primer trim
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage_primtrim/command
md5sum: 67cac223adcf059a9dfaa9f28ed34f68
md5sum: 4945b2eb1bb9c419cdb6da7e78e7229d
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage_primtrim/inputs.json
contains: ["bamfile", "samplename"]
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage_primtrim/outputs.json
Expand Down Expand Up @@ -240,7 +240,7 @@
md5sum: 03c5ecf22fdfdb6b240ac3880281a056
# stats n coverage
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage/command
md5sum: 3dacccb252429a0ff46c079a75a09377
md5sum: 798a1a5da8115c0e3a2f0f7895816e62
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage/inputs.json
contains: ["bamfile", "samplename"]
- path: miniwdl_run/call-ivar_consensus/call-stats_n_coverage/outputs.json
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiacov/test_wf_theiacov_ont.yml
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@
md5sum: 32c0be4fb7f3030bf9c74c0a836d4f2e
- path: miniwdl_run/call-raw_check_reads/work/_miniwdl_inputs/0/ont.fastq.gz
- path: miniwdl_run/call-stats_n_coverage/command
md5sum: 93414eacbbb9d7c4813bb54a8a507078
md5sum: 194a32a05aca8867a2d400465838945d
- path: miniwdl_run/call-stats_n_coverage/inputs.json
contains: ["bamfile", "samplename"]
- path: miniwdl_run/call-stats_n_coverage/outputs.json
Expand All @@ -257,7 +257,7 @@
- path: miniwdl_run/call-stats_n_coverage/work/ont.flagstat.txt
- path: miniwdl_run/call-stats_n_coverage/work/ont.stats.txt
- path: miniwdl_run/call-stats_n_coverage_primtrim/command
md5sum: c6e7de70dfdbb1858229e44777b84110
md5sum: 7246100df31a6508fe16757463e7c5a2
- path: miniwdl_run/call-stats_n_coverage_primtrim/inputs.json
contains: ["bamfile", "samplename"]
- path: miniwdl_run/call-stats_n_coverage_primtrim/outputs.json
Expand Down
2 changes: 2 additions & 0 deletions workflows/theiacov/wf_theiacov_clearlabs.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,8 @@ workflow theiacov_clearlabs {
Int number_Degenerate = consensus_qc.number_Degenerate
Int number_Total = consensus_qc.number_Total
Float percent_reference_coverage = consensus_qc.percent_reference_coverage
# Percentage mapped reads
Float percentage_mapped_reads = stats_n_coverage.percentage_mapped_reads
# SC2 specific coverage outputs
Float? sc2_s_gene_mean_coverage = gene_coverage.sc2_s_gene_depth
Float? sc2_s_gene_percent_coverage = gene_coverage.sc2_s_gene_percent_coverage
Expand Down
5 changes: 3 additions & 2 deletions workflows/theiacov/wf_theiacov_illumina_pe.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ workflow theiacov_illumina_pe {
trim_primers = trim_primers
}
}
# perform flu-specific tasks
# for flu organisms call flu_track
if (organism_parameters.standardized_organism == "flu") {
call run_flu_track.flu_track {
input:
Expand Down Expand Up @@ -439,6 +439,7 @@ workflow theiacov_illumina_pe {
# QC_Check Results
String? qc_check = qc_check_task.qc_check
File? qc_standard = qc_check_task.qc_standard

# Capture percentage_mapped_reads from ivar_consensus task or flu_track task
String percentage_mapped_reads = select_first([ivar_consensus.percentage_mapped_reads, flu_track.percentage_mapped_reads,""])
}
}
2 changes: 2 additions & 0 deletions workflows/theiacov/wf_theiacov_illumina_se.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -317,5 +317,7 @@ workflow theiacov_illumina_se {
# QC_Check Results
String? qc_check = qc_check_task.qc_check
File? qc_standard = qc_check_task.qc_standard
# Capture percentage_mapped_reads from ivar_consensus task
String? percentage_mapped_reads = ivar_consensus.percentage_mapped_reads
}
}
4 changes: 3 additions & 1 deletion workflows/theiacov/wf_theiacov_ont.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ workflow theiacov_ont {
standardized_organism = organism_parameters.standardized_organism,
seq_method = seq_method
}
}
}
# nanoplot for basic QC metrics
call nanoplot_task.nanoplot as nanoplot_raw {
input:
Expand Down Expand Up @@ -427,5 +427,7 @@ workflow theiacov_ont {
# QC_Check Results
String? qc_check = qc_check_task.qc_check
File? qc_standard = qc_check_task.qc_standard
# Non-flu specific outputs
String percentage_mapped_reads = select_first([flu_track.percentage_mapped_reads, stats_n_coverage.percentage_mapped_reads,""])
}
}
11 changes: 10 additions & 1 deletion workflows/utilities/wf_flu_track.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,14 @@ workflow flu_track {
docker = assembly_metrics_docker
}
}
# Calculate the percentage of mapped reads for flu samples
if (defined(irma.seg_ha_bam) || defined(irma.seg_na_bam)) {
call assembly_metrics.stats_n_coverage as flu_stats_n_coverage {
input:
samplename = samplename,
bamfile = select_first([irma.seg_ha_bam, irma.seg_na_bam])
}
}
# combine HA & NA assembly coverages
String ha_na_assembly_coverage_string = "HA: " + select_first([ha_assembly_coverage.depth, ""]) + ", NA: " + select_first([na_assembly_coverage.depth, ""])
# ABRICATE will run if assembly is provided, or was generated with IRMA
Expand Down Expand Up @@ -249,13 +257,14 @@ workflow flu_track {
File? irma_mp_segment_fasta = irma.seg_mp_assembly
File? irma_np_segment_fasta = irma.seg_np_assembly
File? irma_ns_segment_fasta = irma.seg_ns_assembly

Array[File] irma_assemblies = irma.irma_assemblies
Array[File] irma_vcfs = irma.irma_vcfs
Array[File] irma_bams = irma.irma_bams
File? irma_ha_bam = irma.seg_ha_bam
File? irma_na_bam = irma.seg_na_bam
String ha_na_assembly_coverage = ha_na_assembly_coverage_string
# calulate mapped reads percentage for flu samples
Float? percentage_mapped_reads = flu_stats_n_coverage.percentage_mapped_reads
# GenoFLU outputs
String? genoflu_version = genoflu.genoflu_version
String? genoflu_genotype = genoflu.genoflu_genotype
Expand Down
4 changes: 3 additions & 1 deletion workflows/utilities/wf_ivar_consensus.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,8 @@ workflow ivar_consensus {
String meanmapq_trim = select_first([stats_n_coverage_primtrim.meanmapq, stats_n_coverage.meanmapq,""])
String assembly_mean_coverage = select_first([stats_n_coverage_primtrim.depth, stats_n_coverage.depth,""])
String samtools_version_stats = stats_n_coverage.samtools_version


# Assembly metrics
String percentage_mapped_reads = select_first([stats_n_coverage_primtrim.percentage_mapped_reads, stats_n_coverage.percentage_mapped_reads,""])
}
}