Skip to content

Commit

Permalink
[TheiaProk] Adds stxtyper to merlin_magic and TheiaProk wfs (#525)
Browse files Browse the repository at this point in the history
* added WDL task for testing StxTyper. ran successfully with miniwdl

* fix typo in stxtyper task output

* added stxtyper to merlin_magic subwf and theiaprok_illumina_pe. NOT TESTED locally. it may not work

* added log file output to stxtyper task, merlin_magic subwf, and theiaprok_illumina_pe wf. Also added code to catch when stxtyper output TSV is one line meaning no hits were found

* big change to stxtyper outputs. removed many and replaced with 2 outputs: String stxtyper_hits and Int stxtyper_num_hits. updated merlin_magic and theiaprok_illumina_PE workflow and tested successfully w miniwdl

* update stxtyper_num_hits output when 0 hits are found

* move stxtyper call block in merlin_magic wf so that all Escherichia and Shigella (including sonnei) are run through stxtyper

* update to more recent stxtyper docker image that includes 2 new output columns; downsize disk_size to 50 gb default instead of 100 (it doesn't need much as it runs on 1 assembly

* reduced down to 1 cpu and 2 GB memory as I think it runs singlethreaded; preemptible on since this task doesn't run for much more than 2 min; created outputs in case no hits are found; started re-writing parsing code for when hits are detected & finished the bit for complete operons. tested successfully with miniwdl

* big update to stxtyper output parsing. runs successfully, but still needs work to iron out bugs. want to save progress and push a commit

* updated merlin_magic with stxtyper optional inputs so they are exposed to user; major changes to stxtyper parsing again; added a new optional input & 1 output, removed 1 output; still needs work

* updated one stxtyper output string name; updated merlin_magic subwf with new stxtyper outputs; updated theiaprok_illumina_pe with new stxtyper outputs; still need to rework part of parsing for stxtyper but theiaprok ran successfully w miniwdl

* more updates to stxtyper output parsing. tested on many samples w miniwdl successfully

* shorten optional input variable name for stxtyper

* revert to 3 maxRetries stxtyper task

* stxtyper increase memory request to 4 GB

* add stxtyper outputs to theiaprok_fasta wf. tested successfully w miniwdl

* major update to stxtyper WDL task. updated docker image to v1.0.24; added boolean for enabling built-in stxtyper debugging; removed many outputs and added stxtyper_all_hits output; also renamed a few outputs for clarity and consistency; commented out unused portions of code which will be deleted later

* updated inputs/outputs for stxtyper in merlin_magic subwf, updated theiaprok ilmn pe wf outputs; adjusted order outputs in stxtyper task for clarity

* merlin_magic: moved stxtyper call block outside of Escherichia/shigella call block. added optional input call_stxtyper so user can run tool regardless of merlin_tag and GAMBIT_predicted_taxon. tested successfully w miniwdl, need to test in Terra

* cleanup unused code from stxtyper task; adjust code block for when no hits are found and output files are created

* update theiaprok fasta, ilmn SE, ONT with stxtyper outputs. need to test in Terra

* update CI

* update CI

* added TheiaProk workflow inputs and stxtyper block describing the tool to the theiaprok documentation. also fixed a minor comment typo in stxtyper task file

* added stxtyper outputs to theiaprok docs page

* updated theiaprok diagram to include stxtyper under Escherichia spp. and Shigella spp specific tasks

* added punctuation

* update ci

* remove stxtyper_log File output from workflows as it's only slighly useful for debugging and not useful to end user. also removed mention in the docs

* update ci

* update to static link in docs

* updated the correct/most up-to-date theiaprok diagram

* update theiaprok diagram

thank you @sage-wright for help on updating this diagram!
  • Loading branch information
kapsakcj authored Oct 24, 2024
1 parent d4f3da8 commit 3e8c447
Show file tree
Hide file tree
Showing 10 changed files with 254 additions and 22 deletions.
Binary file modified docs/assets/figures/TheiaProk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 57 additions & 18 deletions docs/workflows/genomic_characterization/theiaprok.md

Large diffs are not rendered by default.

122 changes: 122 additions & 0 deletions tasks/species_typing/escherichia_shigella/task_stxtyper.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
version 1.0

task stxtyper {
input {
File assembly
String samplename
Boolean enable_debugging = false # Additional messages are printed and files in $TMPDIR are not removed after running
String docker = "us-docker.pkg.dev/general-theiagen/staphb/stxtyper:1.0.24"
Int disk_size = 50
Int cpu = 1
Int memory = 4
}
command <<<
# fail task if any commands below fail since there's lots of bash conditionals below (AGH!)
set -eo pipefail

# capture version info
stxtyper --version | tee VERSION.txt

# NOTE: by default stxyper uses $TMPDIR or /tmp, so if we run into issues we may need to adjust in the future. Could potentially use PWD as the TMPDIR.
echo "DEBUG: TMPDIR is set to: $TMPDIR"

echo "DEBUG: running StxTyper now..."
# run StxTyper on assembly; may need to add/remove options in the future if they change
# NOTE: stxtyper can accept gzipped assemblies, so no need to unzip
stxtyper \
--nucleotide ~{assembly} \
--name ~{samplename} \
--output ~{samplename}_stxtyper.tsv \
~{true='--debug' false='' enable_debugging} \
--log ~{samplename}_stxtyper.log

# parse output TSV
echo "DEBUG: Parsing StxTyper output TSV..."

# check for output file with only 1 line (meaning no hits found); exit cleanly if so
if [ "$(wc -l < ~{samplename}_stxtyper.tsv)" -eq 1 ]; then
echo "No hits found by StxTyper" > stxtyper_hits.txt
echo "0" > stxtyper_num_hits.txt
echo "DEBUG: No hits found in StxTyper output TSV. Exiting task with exit code 0 now."

# create empty output files
touch stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt
# put "none" into all of them so task does not fail
echo "None" | tee stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt
exit 0
fi

# check for output file with more than 1 line (meaning hits found); count lines & parse output TSV if so
if [ "$(wc -l < ~{samplename}_stxtyper.tsv)" -gt 1 ]; then
echo "Hits found by StxTyper. Counting lines & parsing output TSV now..."
# count number of lines in output TSV (excluding header)
wc -l < ~{samplename}_stxtyper.tsv | awk '{print $1-1}' > stxtyper_num_hits.txt
# remove header line
sed '1d' ~{samplename}_stxtyper.tsv > ~{samplename}_stxtyper_noheader.tsv

##### parse output TSV #####
### complete operons
echo "DEBUG: Parsing complete operons..."
awk -F'\t' -v OFS=, '$4 == "COMPLETE" {print $3}' ~{samplename}_stxtyper.tsv | paste -sd, - | tee stxtyper_complete_operons.txt
# if grep for COMPLETE fails, write "None" to file for output string
if [[ "$(grep --silent 'COMPLETE' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]]; then
echo "None" > stxtyper_complete_operons.txt
fi

### complete_novel operons
echo "DEBUG: Parsing complete novel hits..."
awk -F'\t' -v OFS=, '$4 == "COMPLETE_NOVEL" {print $3}' ~{samplename}_stxtyper.tsv | paste -sd, - | tee stx_novel_hits.txt
# if grep for COMPLETE_NOVEL fails, write "None" to file for output string
if [ "$(grep --silent 'COMPLETE_NOVEL' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]; then
echo "None" > stx_novel_hits.txt
fi

### partial hits (to any gene in stx operon)
echo "DEBUG: Parsing stxtyper partial hits..."
# explanation: if "operon" column contains "PARTIAL" (either PARTIAL or PARTIAL_CONTIG_END possible); print either "stx1" or "stx2" or "stx1,stx2"
awk -F'\t' -v OFS=, '$4 ~ "PARTIAL.*" {print $3}' ~{samplename}_stxtyper.tsv | sort | uniq | paste -sd, - | tee stxtyper_partial_hits.txt
# if no stx partial hits found, write "None" to file for output string
if [ "$(grep --silent 'stx' stxtyper_partial_hits.txt; echo $?)" -gt 0 ]; then
echo "None" > stxtyper_partial_hits.txt
fi

### frameshifts or internal stop codons in stx genes
echo "DEBUG: Parsing stx frameshifts or internal stop codons..."
# explanation: if operon column contains "FRAME_SHIFT" or "INTERNAL_STOP", print the "operon" in a sorted/unique list
awk -F'\t' -v OFS=, '$4 == "FRAMESHIFT" || $4 == "INTERNAL_STOP" {print $3}' ~{samplename}_stxtyper.tsv | sort | uniq | paste -sd, - | tee stxtyper_stx_frameshifts_or_internal_stop_hits.txt
# if no frameshifts or internal stop codons found, write "None" to file for output string
if [ "$(grep --silent -E 'FRAMESHIFT|INTERNAL_STOP' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]; then
echo "None" > stxtyper_stx_frameshifts_or_internal_stop_hits.txt
fi

echo "DEBUG: generating stx_type_all string output now..."
# sort and uniq so there are no duplicates; then paste into a single comma-separated line with commas
# sed is to remove any instances of "None" from the output
cat stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt | sed '/None/d' | sort | uniq | paste -sd, - > stxtyper_all_hits.txt

fi
echo "DEBUG: Finished parsing StxTyper output TSV."
>>>
output {
File stxtyper_report = "~{samplename}_stxtyper.tsv"
File stxtyper_log = "~{samplename}_stxtyper.log"
String stxtyper_docker = docker
String stxtyper_version = read_string("VERSION.txt")
# outputs parsed from stxtyper output TSV
Int stxtyper_num_hits = read_int("stxtyper_num_hits.txt")
String stxtyper_all_hits = read_string("stxtyper_all_hits.txt")
String stxtyper_complete_operon_hits = read_string("stxtyper_complete_operons.txt")
String stxtyper_partial_hits = read_string("stxtyper_partial_hits.txt")
String stxtyper_frameshifts_or_internal_stop_hits = read_string("stxtyper_stx_frameshifts_or_internal_stop_hits.txt")
String stxtyper_novel_hits = read_string("stx_novel_hits.txt")
}
runtime {
docker: "~{docker}"
memory: "~{memory} GB"
cpu: cpu
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
preemptible: 1 # does not take long (usually <3 min) to run stxtyper on 1 genome, preemptible is fine
maxRetries: 3
}
}
4 changes: 2 additions & 2 deletions tests/workflows/theiaprok/test_wf_theiaprok_illumina_pe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -631,9 +631,9 @@
- path: miniwdl_run/wdl/tasks/utilities/data_export/task_broad_terra_tools.wdl
md5sum: 4d69a6539b68503af9f3f1c2787ff920
- path: miniwdl_run/wdl/workflows/theiaprok/wf_theiaprok_illumina_pe.wdl
md5sum: 6d9dd969e2144ca23f2a0e101e6b6966
md5sum: 3cb5c86b15e931b0c0b98ed784386438
- path: miniwdl_run/wdl/workflows/utilities/wf_merlin_magic.wdl
md5sum: 670f990128063eb3c7b3fa49302f08b7
md5sum: ea5cff6eff8c2c42046cf2eae6f16b6f
- path: miniwdl_run/wdl/workflows/utilities/wf_read_QC_trim_pe.wdl
contains: ["version", "QC", "output"]
- path: miniwdl_run/workflow.log
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiaprok/test_wf_theiaprok_illumina_se.yml
Original file line number Diff line number Diff line change
Expand Up @@ -594,9 +594,9 @@
- path: miniwdl_run/wdl/tasks/utilities/data_export/task_broad_terra_tools.wdl
md5sum: 4d69a6539b68503af9f3f1c2787ff920
- path: miniwdl_run/wdl/workflows/theiaprok/wf_theiaprok_illumina_se.wdl
md5sum: 5aa25e4fad466f92c96a7c138aca0d20
md5sum: fdb66b59ac886501a4ae90a25cefd633
- path: miniwdl_run/wdl/workflows/utilities/wf_merlin_magic.wdl
md5sum: 670f990128063eb3c7b3fa49302f08b7
md5sum: ea5cff6eff8c2c42046cf2eae6f16b6f
- path: miniwdl_run/wdl/workflows/utilities/wf_read_QC_trim_se.wdl
md5sum: d11bfe33fdd96eab28892be5a01c1c7d
- path: miniwdl_run/workflow.log
Expand Down
10 changes: 10 additions & 0 deletions workflows/theiaprok/wf_theiaprok_fasta.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -576,6 +576,16 @@ workflow theiaprok_fasta {
File? virulencefinder_report_tsv = merlin_magic.virulencefinder_report_tsv
String? virulencefinder_docker = merlin_magic.virulencefinder_docker
String? virulencefinder_hits = merlin_magic.virulencefinder_hits
# stxtyper
File? stxtyper_report = merlin_magic.stxtyper_report
String? stxtyper_docker = merlin_magic.stxtyper_docker
String? stxtyper_version = merlin_magic.stxtyper_version
Int? stxtyper_num_hits = merlin_magic.stxtyper_num_hits
String? stxtyper_all_hits = merlin_magic.stxtyper_all_hits
String? stxtyper_complete_operons = merlin_magic.stxtyper_complete_operon_hits
String? stxtyper_partial_hits = merlin_magic.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = merlin_magic.stxtyper_stx_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = merlin_magic.stxtyper_novel_hits
# Listeria Typing
File? lissero_results = merlin_magic.lissero_results
String? lissero_version = merlin_magic.lissero_version
Expand Down
10 changes: 10 additions & 0 deletions workflows/theiaprok/wf_theiaprok_illumina_pe.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -819,6 +819,16 @@ workflow theiaprok_illumina_pe {
File? virulencefinder_report_tsv = merlin_magic.virulencefinder_report_tsv
String? virulencefinder_docker = merlin_magic.virulencefinder_docker
String? virulencefinder_hits = merlin_magic.virulencefinder_hits
# stxtyper
File? stxtyper_report = merlin_magic.stxtyper_report
String? stxtyper_docker = merlin_magic.stxtyper_docker
String? stxtyper_version = merlin_magic.stxtyper_version
Int? stxtyper_num_hits = merlin_magic.stxtyper_num_hits
String? stxtyper_all_hits = merlin_magic.stxtyper_all_hits
String? stxtyper_complete_operons = merlin_magic.stxtyper_complete_operon_hits
String? stxtyper_partial_hits = merlin_magic.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = merlin_magic.stxtyper_stx_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = merlin_magic.stxtyper_novel_hits
# Shigella sonnei Typing
File? sonneityping_mykrobe_report_csv = merlin_magic.sonneityping_mykrobe_report_csv
File? sonneityping_mykrobe_report_json = merlin_magic.sonneityping_mykrobe_report_json
Expand Down
10 changes: 10 additions & 0 deletions workflows/theiaprok/wf_theiaprok_illumina_se.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -758,6 +758,16 @@ workflow theiaprok_illumina_se {
File? virulencefinder_report_tsv = merlin_magic.virulencefinder_report_tsv
String? virulencefinder_docker = merlin_magic.virulencefinder_docker
String? virulencefinder_hits = merlin_magic.virulencefinder_hits
# stxtyper
File? stxtyper_report = merlin_magic.stxtyper_report
String? stxtyper_docker = merlin_magic.stxtyper_docker
String? stxtyper_version = merlin_magic.stxtyper_version
Int? stxtyper_num_hits = merlin_magic.stxtyper_num_hits
String? stxtyper_all_hits = merlin_magic.stxtyper_all_hits
String? stxtyper_complete_operons = merlin_magic.stxtyper_complete_operon_hits
String? stxtyper_partial_hits = merlin_magic.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = merlin_magic.stxtyper_stx_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = merlin_magic.stxtyper_novel_hits
# Shigella sonnei Typing
File? sonneityping_mykrobe_report_csv = merlin_magic.sonneityping_mykrobe_report_csv
File? sonneityping_mykrobe_report_json = merlin_magic.sonneityping_mykrobe_report_json
Expand Down
10 changes: 10 additions & 0 deletions workflows/theiaprok/wf_theiaprok_ont.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -730,6 +730,16 @@ workflow theiaprok_ont {
File? virulencefinder_report_tsv = merlin_magic.virulencefinder_report_tsv
String? virulencefinder_docker = merlin_magic.virulencefinder_docker
String? virulencefinder_hits = merlin_magic.virulencefinder_hits
# stxtyper
File? stxtyper_report = merlin_magic.stxtyper_report
String? stxtyper_docker = merlin_magic.stxtyper_docker
String? stxtyper_version = merlin_magic.stxtyper_version
Int? stxtyper_num_hits = merlin_magic.stxtyper_num_hits
String? stxtyper_all_hits = merlin_magic.stxtyper_all_hits
String? stxtyper_complete_operons = merlin_magic.stxtyper_complete_operon_hits
String? stxtyper_partial_hits = merlin_magic.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = merlin_magic.stxtyper_stx_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = merlin_magic.stxtyper_novel_hits
# Shigella sonnei Typing
File? sonneityping_mykrobe_report_csv = merlin_magic.sonneityping_mykrobe_report_csv
File? sonneityping_mykrobe_report_json = merlin_magic.sonneityping_mykrobe_report_json
Expand Down
31 changes: 31 additions & 0 deletions workflows/utilities/wf_merlin_magic.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import "../../tasks/species_typing/escherichia_shigella/task_serotypefinder.wdl"
import "../../tasks/species_typing/escherichia_shigella/task_shigatyper.wdl" as shigatyper_task
import "../../tasks/species_typing/escherichia_shigella/task_shigeifinder.wdl" as shigeifinder_task
import "../../tasks/species_typing/escherichia_shigella/task_sonneityping.wdl" as sonneityping_task
import "../../tasks/species_typing/escherichia_shigella/task_stxtyper.wdl" as stxtyper_task
import "../../tasks/species_typing/escherichia_shigella/task_virulencefinder.wdl" as virulencefinder_task
import "../../tasks/species_typing/haemophilus/task_hicap.wdl" as hicap_task
import "../../tasks/species_typing/klebsiella/task_kleborate.wdl" as kleborate_task
Expand Down Expand Up @@ -218,6 +219,13 @@ workflow merlin_magic {
Float? virulencefinder_coverage_threshold
Float? virulencefinder_identity_threshold
String? virulencefinder_database
# stxtyper options
Boolean call_stxtyper = false # set to true to run stxtyper on any bacterial sample
Boolean? stxtyper_enable_debug
String? stxtyper_docker_image
Int? stxtyper_disk_size
Int? stxtyper_cpu
Int? stxtyper_memory
}
# theiaprok
if (merlin_tag == "Acinetobacter baumannii") {
Expand All @@ -241,6 +249,19 @@ workflow merlin_magic {
docker = abricate_abaum_docker_image
}
}
# stxtyper is special & in it's own conditional block because it should automatically be run on Escherichia and Shigella species; but optionally run on ANY bacterial sample if the user wants to screen for Shiga toxin genes
if (merlin_tag == "Escherichia" || merlin_tag == "Shigella sonnei" || call_stxtyper == true ) {
call stxtyper_task.stxtyper {
input:
assembly = assembly,
samplename = samplename,
docker = stxtyper_docker_image,
disk_size = stxtyper_disk_size,
cpu = stxtyper_cpu,
memory = stxtyper_memory,
enable_debugging = stxtyper_enable_debug
}
}
if (merlin_tag == "Escherichia" || merlin_tag == "Shigella sonnei" ) {
# tools specific to ALL Escherichia and Shigella species
#
Expand Down Expand Up @@ -755,6 +776,16 @@ workflow merlin_magic {
File? virulencefinder_report_tsv = virulencefinder.virulencefinder_report_tsv
String? virulencefinder_docker = virulencefinder.virulencefinder_docker
String? virulencefinder_hits = virulencefinder.virulencefinder_hits
# stxtyper
File? stxtyper_report = stxtyper.stxtyper_report
String? stxtyper_docker = stxtyper.stxtyper_docker
String? stxtyper_version = stxtyper.stxtyper_version
Int? stxtyper_num_hits = stxtyper.stxtyper_num_hits
String? stxtyper_all_hits = stxtyper.stxtyper_all_hits
String? stxtyper_complete_operon_hits = stxtyper.stxtyper_complete_operon_hits
String? stxtyper_partial_hits = stxtyper.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = stxtyper.stxtyper_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = stxtyper.stxtyper_novel_hits
# Shigella sonnei Typing
File? sonneityping_mykrobe_report_csv = sonneityping.sonneityping_mykrobe_report_csv
File? sonneityping_mykrobe_report_json = sonneityping.sonneityping_mykrobe_report_json
Expand Down

0 comments on commit 3e8c447

Please sign in to comment.