[TheiaCov] wfs add percentage_mapped_reads #641

fraser-combe · 2024-10-03T17:16:44Z

This PR closes #507

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

This PR adds outputs for percentage_mapped_reads to various workflows, specifically targeting reads for flu and non-flu organisms, ensuring consistency in outputs.

⚡ Impacted Workflows/Tasks

theiacov_ont.wdl
theiacov_illumina_pe.wdl
theiacov_illumina_se.wdl
theiacov_clearlabs.wdl

This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

Added unified output for percentage_mapped_reads across theiacov_ont, theiacov_illumina_pe, theiacov_illumina_se, and theiacov_clearlabs workflows.
Consolidated flu and non-flu percentage mapped reads using select_first to ensure a single output variable for mapped reads.
Refined logic for flu and non-flu workflows to ensure correct handling of percentage_mapped_reads.

⚙️ Algorithm

No major algorithmic changes were introduced, but the logic for flu and non-flu organisms in calculating percentage_mapped_reads was updated to call different tasks.
For iVar-based workflows (theiacov_illumina_pe, theiacov_illumina_se), the percentage is parsed from the samtools flagstat file.
For non-iVar workflows, the assembled_reads_percent task is used to pass in BAM files and calculate mapped reads.

➡️ Inputs

No new inputs were added.

⬅️ Outputs

The following outputs were updated or added:

percentage_mapped_reads:
For non-flu organisms, calculated using either ivar_consensus.percentage_mapped_reads (for iVar-based workflows) or from stats_n_coverage.percentage_mapped_reads for ONT workflows.

🧪 Testing

Tested both flu and non-flu cases across workflows, ensuring the correct mapping of percentage_mapped_reads.

TheiaCov_ONT (non flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/a158d7fc-aac7-4c9e-8782-e8d96afe059a

TheiaCov_ONT (flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/fd988d1f-5a75-4b7f-b1e5-ce293a345a13

ThieaCov_illumina_pe (flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/3ca0d049-31b4-4179-b2ce-20753946f40c/ec42f3a3-7b30-4756-aa72-39640adb92a9

TheiaCov_illumina_pe (non-flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/cd90a924-8fe9-48c3-814e-d4fce8453632

TheiaCov_illumina_se
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/dcfb1141-a483-4eb3-b280-7377a1df4a4e

Suggested Scenarios for Reviewer to Test

Run workflows with flu and non-flu samples to ensure the correct assignment of percentage_mapped_reads.

Validate the unified logic correctly handles percentage_mapped_reads for both flu and non-flu workflows, particularly for iVar-based and non-iVar-based workflows.

🔬 Final Developer Checklist

The workflow/task has been tested and results, including file contents, are as anticipated
The CI/CD has been adjusted and tests are passing (Theiagen developers)
Code changes follow the style guide
Documentation and/or workflow diagrams have been updated if applicable (Theiagen developers only)

🎯 Reviewer Checklist

All changed results have been confirmed
You have tested the PR appropriately (see the testing guide for more information)
All code adheres to the style guide
MD5 sums have been updated
The PR author has addressed all comments
The documentation has been updated

…d stats_n_coverage task

…ads-dev

workflows/theiacov/wf_theiacov_ont.wdl

workflows/theiacov/wf_theiacov_illumina_pe.wdl

workflows/theiacov/wf_theiacov_ont.wdl

workflows/theiacov/wf_theiacov_illumina_pe.wdl

sage-wright

Great start, just needs a couple changes to clean things up a bit

fraser-combe · 2024-10-22T19:23:34Z

confirmed mapped reads output after moving into flu track wf,

sage-wright · 2024-10-24T20:35:53Z

Running tests on illumina pe flu here and various ont here.

One last request: could you tidy up some of the extra newlines at the end of the tasks in the output sections?

workflows/theiacov/wf_theiacov_clearlabs.wdl

sage-wright · 2024-10-24T20:47:29Z

Also I was wrong and you do need to provide a default value for the percentage_mapped_reads in case the read_screen fails (see here).

Could you coerce all of these non-optional outputs into Strings and provide "" as the default value at the end of the select_first for the workflows? That way, when the read_screen fails, there isn't a 0 populated to the column since that would be an inaccurate statement.

in ClearLabs, remove the ? quantifier off the float
in SE, change the Float? to a String?
in PE, change the Float to a String and add a , "" to the end of the select_first block
in ONT, change the Float to a String and add a , "" to the end of the select_first block
in ivarconsensus, change change it to be a String and add , "" to the end of the select_first (like we do for meanbaseq and meanmapq, assembly_mean_coverage, etc.

Thanks!

sage-wright · 2024-10-24T21:00:47Z

just as a side note, do you have any examples where it's not 100%?

fraser-combe · 2024-10-24T21:21:53Z

just as a side note, do you have any examples where it's not 100%?

No Id have to try and find data, unless you have any potential data that may provide this kind of result

fraser-combe · 2024-10-24T21:35:54Z

Those updates should be passing tests now, let me know if you need me to find more testing data after your tests have completed

tasks/quality_control/basic_statistics/task_assembly_metrics.wdl

fraser-combe added 20 commits September 9, 2024 16:36

Added percentage_mapped_reads output to ivar_consensus.wdl and update…

c64b57b

…d stats_n_coverage task

update mapped reads trying read_float

93e2d89

get read numbers from stats file

5298eff

get read numbers from stats filev2

85a8cc8

change from bc to awk for calculation

9c2cb18

update awk

c57edc4

metric output txt instead of csv

76b62ce

reswitchack to read_string output t

e6afcdd

percentage mapped reads based on trimmed bam file theiacov_ont

dba965b

update theiacov-ont for mapped reads

dd7b3e7

Merge remote-tracking branch 'origin/main' into fc-theiacov-mapped-re…

b803c7a

…ads-dev

pass output ivar cons mapped reads to wf for terra output

f6d5393

perc mapped reads output flu track PE, ONT and clearlabs and doc update

f888c65

updated namings outputs cov_ONT and removed extra call assembly metrics

92733ea

change naming output stat n coverage task

72db285

update flu mapped reads perc variable name

69b4666

make theiacov_ont conditional output flu mapped reads

b9320d7

wdl does not support if cond in output change to select first

0aeb272

wdl does not support if cond in output change to select first

4846b1e

combine flu and non flu into same mapped reads output

33ab026

fraser-combe requested a review from a team as a code owner October 3, 2024 17:16

fraser-combe force-pushed the fc-theiacov-mapped-reads-dev branch from 0dc9640 to 33ab026 Compare October 4, 2024 12:47

sage-wright marked this pull request as draft October 16, 2024 17:53

fraser-combe added 6 commits October 21, 2024 11:35

correct assembled reads call

7070c62

update mdsums

e23b19a

update clearlabs for statncov call

b790973

float?

58ac2ee

mdsums and pe wf update flu irma defined

45568de

mdsum pe

5454111

fraser-combe marked this pull request as ready for review October 21, 2024 21:30

sage-wright reviewed Oct 22, 2024

View reviewed changes

workflows/theiacov/wf_theiacov_ont.wdl Outdated Show resolved Hide resolved

sage-wright reviewed Oct 22, 2024

View reviewed changes

workflows/theiacov/wf_theiacov_illumina_pe.wdl Outdated Show resolved Hide resolved

sage-wright reviewed Oct 22, 2024

View reviewed changes

workflows/theiacov/wf_theiacov_ont.wdl Outdated Show resolved Hide resolved

sage-wright reviewed Oct 22, 2024

View reviewed changes

workflows/theiacov/wf_theiacov_illumina_pe.wdl Outdated Show resolved Hide resolved

sage-wright requested changes Oct 22, 2024

View reviewed changes

fraser-combe added 2 commits October 22, 2024 13:21

move to flue track

6493640

tidy output pe and ont

56a4a30

fraser-combe requested a review from sage-wright October 23, 2024 19:37

sage-wright reviewed Oct 24, 2024

View reviewed changes

workflows/theiacov/wf_theiacov_clearlabs.wdl Outdated Show resolved Hide resolved

update strings and provide default values

108612b

fraser-combe requested a review from sage-wright October 24, 2024 21:35

sage-wright reviewed Oct 25, 2024

View reviewed changes

tasks/quality_control/basic_statistics/task_assembly_metrics.wdl Outdated Show resolved Hide resolved

clean tab/spaces echo

842691a

fraser-combe requested a review from sage-wright October 31, 2024 20:28

my fav commit mdsums!

cd32e74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TheiaCov] wfs add percentage_mapped_reads #641

[TheiaCov] wfs add percentage_mapped_reads #641

fraser-combe commented Oct 3, 2024 •

edited by sage-wright

Loading

sage-wright left a comment

fraser-combe commented Oct 22, 2024

sage-wright commented Oct 24, 2024

sage-wright commented Oct 24, 2024 •

edited by fraser-combe

Loading

sage-wright commented Oct 24, 2024

fraser-combe commented Oct 24, 2024

fraser-combe commented Oct 24, 2024

[TheiaCov] wfs add percentage_mapped_reads #641

Are you sure you want to change the base?

[TheiaCov] wfs add percentage_mapped_reads #641

Conversation

fraser-combe commented Oct 3, 2024 • edited by sage-wright Loading

🧠 Summary

⚡ Impacted Workflows/Tasks

🛠️ Changes

⚙️ Algorithm

➡️ Inputs

⬅️ Outputs

🧪 Testing

Suggested Scenarios for Reviewer to Test

🔬 Final Developer Checklist

🎯 Reviewer Checklist

sage-wright left a comment

Choose a reason for hiding this comment

fraser-combe commented Oct 22, 2024

sage-wright commented Oct 24, 2024

sage-wright commented Oct 24, 2024 • edited by fraser-combe Loading

sage-wright commented Oct 24, 2024

fraser-combe commented Oct 24, 2024

fraser-combe commented Oct 24, 2024

fraser-combe commented Oct 3, 2024 •

edited by sage-wright

Loading

sage-wright commented Oct 24, 2024 •

edited by fraser-combe

Loading