-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added 2 QC thresholds to ANI task to reduce false positives #168
Conversation
…and memory. added logic for comparing ANI_HIGHEST_PERCENTAGE to ani_threshold and only outputting the name of the match if the threshold is surpassed. tested successfully in miniwdl.
todos:
|
…le in theiaprok illumina pe wf
…ult. default ani_threshold is now 85. Also added logic to only output ani_top_species_match if both thresholds are surpassed.
After adding in the 2nd threshold ( Enteric species were output as the and close relatives (but different genera) had warning messages in the Now to resolve these conflicts 🥲 |
…and memory. added logic for comparing ANI_HIGHEST_PERCENTAGE to ani_threshold and only outputting the name of the match if the threshold is surpassed. tested successfully in miniwdl.
…le in theiaprok illumina pe wf
…ult. default ani_threshold is now 85. Also added logic to only output ani_top_species_match if both thresholds are surpassed.
…c_health_bioinformatics into cjk-ani-threshold
OK, I've rebased this branch to be on top of the most recent Likely will have to adjust CI hashes, but this cleans up the commit history so that all ANI related code changes in this PR occur "now" in the commit history, instead of back when I initially started this dev branch |
…ct resolution (my bad!)
Once the CI is updated and running successfully, I will relaunch tests on the various workflow to ensure functionality is retained after rebasing this branch |
FYI I have merged origin/main into this branch which totally offsets/nullifies the rebasing I did. Should make the "files changed" section much simpler and easier to review code changes |
Tests ran after the making (hopefully) the final commits:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks great. Will approve upon successful tests:
- SE: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Wright_PHBG_Sandbox/job_history/30e07d66-de76-48be-8381-d3cae68453b5
- PE: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Wright_PHBG_Sandbox/job_history/0ebe47d7-e1a1-4648-8206-bf18dcf18bdb
- FASTA: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Wright_PHBG_Sandbox/job_history/9e543b1d-4be3-4f99-9217-f9f97f2cfd00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐
FYI: leaving in a draft state until further testing is done and feedback is received. Code changes are pretty much finished.We may end up adjusting the default threshold ofadjusted to 80.092.0
to a different value.All testing is finished. This PR is ready for review 👀
🛠️ Changes Being Made
tasks/quality_control/task_mummer_ani.wdl
changesani_threshold = 80.0
as per CDC EDLB/PulseNet standardpercent_bases_aligned_threshold = 70.0
as per CDC EDLB/PulseNet standardcpus
andmemory
Integer optional inputs..txt
suffixes to output files to help differentiate between these and bash variablesANI_HIGHEST_PERCENTAGE
toani_threshold
and only outputting the name of the match if both thresholds are surpassed. If threshold is not surpassed, it will print a message sayingANI top species match did not surpass the user-defined threshold of ~{ani_threshold}
ANI_HIGHEST_PERCENT_BASES_ALIGNED
topercent_bases_aligned_threshold
and only outputting the name of the match if both thresholds are surpassed. If threshold is not surpassed, it will print a message sayingANI percent bases aligned did not surpass the user-defined threshold of ~{percent_bases_aligned_threshold}
ani_docker
TheiaProk Illumina PE, SE, ONT, and FASTA workflow changes:
ani_docker
to all workflowsani_docker
string output to call block forexport_taxon_tables
tasktasks/utilities/task_broad_terra_tools.wdl
/export taxon tables taskani_docker
string input to this task🧠 Context and Rationale
Request from CDPH PulseNet group.
For example: we do not want to see an Enterobacter cloacae sample show the top ANI hit of
Salmonella_enterica
because that can be confusing to the user.Adding this threshold will filter out a large majority, if not ALL false positive top matches where the species is not represented in the database (RGDv2, enteric pathogens) yet ANI & genetic relatedness is high enough to show a result.
📋 Workflow/Task Steps
Inputs
new optional inputs:
ani_threshold
with default value of85.00
percent_bases_aligned_threshold
with default value of70.0
cpus
with default value of4
memory
with default value of8
Outputs
New required output:
ani_docker
🧪 Testing
Locally
tested locally with miniwdl
Terra
Successfully ran TheiaProk_Illumina_PE (and export_taxon_tables behaved as expected): https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/af44a279-9918-4d0d-823a-5dccdf65aec0
Tests TODO:
🔬 Quality checks
Pull Request (PR) checklist: