-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[TheiaProk] Adds stxtyper to merlin_magic and TheiaProk wfs (#525)
* added WDL task for testing StxTyper. ran successfully with miniwdl * fix typo in stxtyper task output * added stxtyper to merlin_magic subwf and theiaprok_illumina_pe. NOT TESTED locally. it may not work * added log file output to stxtyper task, merlin_magic subwf, and theiaprok_illumina_pe wf. Also added code to catch when stxtyper output TSV is one line meaning no hits were found * big change to stxtyper outputs. removed many and replaced with 2 outputs: String stxtyper_hits and Int stxtyper_num_hits. updated merlin_magic and theiaprok_illumina_PE workflow and tested successfully w miniwdl * update stxtyper_num_hits output when 0 hits are found * move stxtyper call block in merlin_magic wf so that all Escherichia and Shigella (including sonnei) are run through stxtyper * update to more recent stxtyper docker image that includes 2 new output columns; downsize disk_size to 50 gb default instead of 100 (it doesn't need much as it runs on 1 assembly * reduced down to 1 cpu and 2 GB memory as I think it runs singlethreaded; preemptible on since this task doesn't run for much more than 2 min; created outputs in case no hits are found; started re-writing parsing code for when hits are detected & finished the bit for complete operons. tested successfully with miniwdl * big update to stxtyper output parsing. runs successfully, but still needs work to iron out bugs. want to save progress and push a commit * updated merlin_magic with stxtyper optional inputs so they are exposed to user; major changes to stxtyper parsing again; added a new optional input & 1 output, removed 1 output; still needs work * updated one stxtyper output string name; updated merlin_magic subwf with new stxtyper outputs; updated theiaprok_illumina_pe with new stxtyper outputs; still need to rework part of parsing for stxtyper but theiaprok ran successfully w miniwdl * more updates to stxtyper output parsing. tested on many samples w miniwdl successfully * shorten optional input variable name for stxtyper * revert to 3 maxRetries stxtyper task * stxtyper increase memory request to 4 GB * add stxtyper outputs to theiaprok_fasta wf. tested successfully w miniwdl * major update to stxtyper WDL task. updated docker image to v1.0.24; added boolean for enabling built-in stxtyper debugging; removed many outputs and added stxtyper_all_hits output; also renamed a few outputs for clarity and consistency; commented out unused portions of code which will be deleted later * updated inputs/outputs for stxtyper in merlin_magic subwf, updated theiaprok ilmn pe wf outputs; adjusted order outputs in stxtyper task for clarity * merlin_magic: moved stxtyper call block outside of Escherichia/shigella call block. added optional input call_stxtyper so user can run tool regardless of merlin_tag and GAMBIT_predicted_taxon. tested successfully w miniwdl, need to test in Terra * cleanup unused code from stxtyper task; adjust code block for when no hits are found and output files are created * update theiaprok fasta, ilmn SE, ONT with stxtyper outputs. need to test in Terra * update CI * update CI * added TheiaProk workflow inputs and stxtyper block describing the tool to the theiaprok documentation. also fixed a minor comment typo in stxtyper task file * added stxtyper outputs to theiaprok docs page * updated theiaprok diagram to include stxtyper under Escherichia spp. and Shigella spp specific tasks * added punctuation * update ci * remove stxtyper_log File output from workflows as it's only slighly useful for debugging and not useful to end user. also removed mention in the docs * update ci * update to static link in docs * updated the correct/most up-to-date theiaprok diagram * update theiaprok diagram thank you @sage-wright for help on updating this diagram!
- Loading branch information
Showing
10 changed files
with
254 additions
and
22 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
122 changes: 122 additions & 0 deletions
122
tasks/species_typing/escherichia_shigella/task_stxtyper.wdl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
version 1.0 | ||
|
||
task stxtyper { | ||
input { | ||
File assembly | ||
String samplename | ||
Boolean enable_debugging = false # Additional messages are printed and files in $TMPDIR are not removed after running | ||
String docker = "us-docker.pkg.dev/general-theiagen/staphb/stxtyper:1.0.24" | ||
Int disk_size = 50 | ||
Int cpu = 1 | ||
Int memory = 4 | ||
} | ||
command <<< | ||
# fail task if any commands below fail since there's lots of bash conditionals below (AGH!) | ||
set -eo pipefail | ||
|
||
# capture version info | ||
stxtyper --version | tee VERSION.txt | ||
|
||
# NOTE: by default stxyper uses $TMPDIR or /tmp, so if we run into issues we may need to adjust in the future. Could potentially use PWD as the TMPDIR. | ||
echo "DEBUG: TMPDIR is set to: $TMPDIR" | ||
|
||
echo "DEBUG: running StxTyper now..." | ||
# run StxTyper on assembly; may need to add/remove options in the future if they change | ||
# NOTE: stxtyper can accept gzipped assemblies, so no need to unzip | ||
stxtyper \ | ||
--nucleotide ~{assembly} \ | ||
--name ~{samplename} \ | ||
--output ~{samplename}_stxtyper.tsv \ | ||
~{true='--debug' false='' enable_debugging} \ | ||
--log ~{samplename}_stxtyper.log | ||
|
||
# parse output TSV | ||
echo "DEBUG: Parsing StxTyper output TSV..." | ||
|
||
# check for output file with only 1 line (meaning no hits found); exit cleanly if so | ||
if [ "$(wc -l < ~{samplename}_stxtyper.tsv)" -eq 1 ]; then | ||
echo "No hits found by StxTyper" > stxtyper_hits.txt | ||
echo "0" > stxtyper_num_hits.txt | ||
echo "DEBUG: No hits found in StxTyper output TSV. Exiting task with exit code 0 now." | ||
|
||
# create empty output files | ||
touch stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt | ||
# put "none" into all of them so task does not fail | ||
echo "None" | tee stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt | ||
exit 0 | ||
fi | ||
|
||
# check for output file with more than 1 line (meaning hits found); count lines & parse output TSV if so | ||
if [ "$(wc -l < ~{samplename}_stxtyper.tsv)" -gt 1 ]; then | ||
echo "Hits found by StxTyper. Counting lines & parsing output TSV now..." | ||
# count number of lines in output TSV (excluding header) | ||
wc -l < ~{samplename}_stxtyper.tsv | awk '{print $1-1}' > stxtyper_num_hits.txt | ||
# remove header line | ||
sed '1d' ~{samplename}_stxtyper.tsv > ~{samplename}_stxtyper_noheader.tsv | ||
|
||
##### parse output TSV ##### | ||
### complete operons | ||
echo "DEBUG: Parsing complete operons..." | ||
awk -F'\t' -v OFS=, '$4 == "COMPLETE" {print $3}' ~{samplename}_stxtyper.tsv | paste -sd, - | tee stxtyper_complete_operons.txt | ||
# if grep for COMPLETE fails, write "None" to file for output string | ||
if [[ "$(grep --silent 'COMPLETE' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]]; then | ||
echo "None" > stxtyper_complete_operons.txt | ||
fi | ||
|
||
### complete_novel operons | ||
echo "DEBUG: Parsing complete novel hits..." | ||
awk -F'\t' -v OFS=, '$4 == "COMPLETE_NOVEL" {print $3}' ~{samplename}_stxtyper.tsv | paste -sd, - | tee stx_novel_hits.txt | ||
# if grep for COMPLETE_NOVEL fails, write "None" to file for output string | ||
if [ "$(grep --silent 'COMPLETE_NOVEL' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]; then | ||
echo "None" > stx_novel_hits.txt | ||
fi | ||
|
||
### partial hits (to any gene in stx operon) | ||
echo "DEBUG: Parsing stxtyper partial hits..." | ||
# explanation: if "operon" column contains "PARTIAL" (either PARTIAL or PARTIAL_CONTIG_END possible); print either "stx1" or "stx2" or "stx1,stx2" | ||
awk -F'\t' -v OFS=, '$4 ~ "PARTIAL.*" {print $3}' ~{samplename}_stxtyper.tsv | sort | uniq | paste -sd, - | tee stxtyper_partial_hits.txt | ||
# if no stx partial hits found, write "None" to file for output string | ||
if [ "$(grep --silent 'stx' stxtyper_partial_hits.txt; echo $?)" -gt 0 ]; then | ||
echo "None" > stxtyper_partial_hits.txt | ||
fi | ||
|
||
### frameshifts or internal stop codons in stx genes | ||
echo "DEBUG: Parsing stx frameshifts or internal stop codons..." | ||
# explanation: if operon column contains "FRAME_SHIFT" or "INTERNAL_STOP", print the "operon" in a sorted/unique list | ||
awk -F'\t' -v OFS=, '$4 == "FRAMESHIFT" || $4 == "INTERNAL_STOP" {print $3}' ~{samplename}_stxtyper.tsv | sort | uniq | paste -sd, - | tee stxtyper_stx_frameshifts_or_internal_stop_hits.txt | ||
# if no frameshifts or internal stop codons found, write "None" to file for output string | ||
if [ "$(grep --silent -E 'FRAMESHIFT|INTERNAL_STOP' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]; then | ||
echo "None" > stxtyper_stx_frameshifts_or_internal_stop_hits.txt | ||
fi | ||
|
||
echo "DEBUG: generating stx_type_all string output now..." | ||
# sort and uniq so there are no duplicates; then paste into a single comma-separated line with commas | ||
# sed is to remove any instances of "None" from the output | ||
cat stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt | sed '/None/d' | sort | uniq | paste -sd, - > stxtyper_all_hits.txt | ||
|
||
fi | ||
echo "DEBUG: Finished parsing StxTyper output TSV." | ||
>>> | ||
output { | ||
File stxtyper_report = "~{samplename}_stxtyper.tsv" | ||
File stxtyper_log = "~{samplename}_stxtyper.log" | ||
String stxtyper_docker = docker | ||
String stxtyper_version = read_string("VERSION.txt") | ||
# outputs parsed from stxtyper output TSV | ||
Int stxtyper_num_hits = read_int("stxtyper_num_hits.txt") | ||
String stxtyper_all_hits = read_string("stxtyper_all_hits.txt") | ||
String stxtyper_complete_operon_hits = read_string("stxtyper_complete_operons.txt") | ||
String stxtyper_partial_hits = read_string("stxtyper_partial_hits.txt") | ||
String stxtyper_frameshifts_or_internal_stop_hits = read_string("stxtyper_stx_frameshifts_or_internal_stop_hits.txt") | ||
String stxtyper_novel_hits = read_string("stx_novel_hits.txt") | ||
} | ||
runtime { | ||
docker: "~{docker}" | ||
memory: "~{memory} GB" | ||
cpu: cpu | ||
disks: "local-disk " + disk_size + " SSD" | ||
disk: disk_size + " GB" | ||
preemptible: 1 # does not take long (usually <3 min) to run stxtyper on 1 genome, preemptible is fine | ||
maxRetries: 3 | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters