Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TheiaCov] wfs add percentage_mapped_reads #641

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

fraser-combe
Copy link
Contributor

@fraser-combe fraser-combe commented Oct 3, 2024

This PR closes #507

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

This PR adds outputs for percentage_mapped_reads to various workflows, specifically targeting reads for flu and non-flu organisms, ensuring consistency in outputs.

⚡ Impacted Workflows/Tasks

theiacov_ont.wdl
theiacov_illumina_pe.wdl
theiacov_illumina_se.wdl
theiacov_clearlabs.wdl

This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

Added unified output for percentage_mapped_reads across theiacov_ont, theiacov_illumina_pe, theiacov_illumina_se, and theiacov_clearlabs workflows.
Consolidated flu and non-flu percentage mapped reads using select_first to ensure a single output variable for mapped reads.
Refined logic for flu and non-flu workflows to ensure correct handling of percentage_mapped_reads.

⚙️ Algorithm

  • No major algorithmic changes were introduced, but the logic for flu and non-flu organisms in calculating percentage_mapped_reads was updated to call different tasks.

  • For iVar-based workflows (theiacov_illumina_pe, theiacov_illumina_se), the percentage is parsed from the samtools flagstat file.

  • For non-iVar workflows, the assembled_reads_percent task is used to pass in BAM files and calculate mapped reads.

➡️ Inputs

No new inputs were added.

⬅️ Outputs

The following outputs were updated or added:

percentage_mapped_reads:
For non-flu organisms, calculated using either ivar_consensus.percentage_mapped_reads (for iVar-based workflows) or from stats_n_coverage.percentage_mapped_reads for ONT workflows.

🧪 Testing

Tested both flu and non-flu cases across workflows, ensuring the correct mapping of percentage_mapped_reads.

TheiaCov_ONT (non flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/a158d7fc-aac7-4c9e-8782-e8d96afe059a
image

TheiaCov_ONT (flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/fd988d1f-5a75-4b7f-b1e5-ce293a345a13
image

ThieaCov_illumina_pe (flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/3ca0d049-31b4-4179-b2ce-20753946f40c/ec42f3a3-7b30-4756-aa72-39640adb92a9
image

TheiaCov_illumina_pe (non-flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/cd90a924-8fe9-48c3-814e-d4fce8453632
image

TheiaCov_illumina_se
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/dcfb1141-a483-4eb3-b280-7377a1df4a4e

Suggested Scenarios for Reviewer to Test

Run workflows with flu and non-flu samples to ensure the correct assignment of percentage_mapped_reads.

Validate the unified logic correctly handles percentage_mapped_reads for both flu and non-flu workflows, particularly for iVar-based and non-iVar-based workflows.

🔬 Final Developer Checklist

  • The workflow/task has been tested and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (Theiagen developers)
  • Code changes follow the style guide
  • Documentation and/or workflow diagrams have been updated if applicable (Theiagen developers only)

🎯 Reviewer Checklist

  • All changed results have been confirmed
  • You have tested the PR appropriately (see the testing guide for more information)
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments
  • The documentation has been updated

@fraser-combe fraser-combe requested a review from a team as a code owner October 3, 2024 17:16
@sage-wright sage-wright marked this pull request as draft October 16, 2024 17:53
@fraser-combe fraser-combe marked this pull request as ready for review October 21, 2024 21:30
Copy link
Member

@sage-wright sage-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start, just needs a couple changes to clean things up a bit

@fraser-combe
Copy link
Contributor Author

confirmed mapped reads output after moving into flu track wf,
image

@sage-wright
Copy link
Member

Running tests on illumina pe flu here and various ont here.

One last request: could you tidy up some of the extra newlines at the end of the tasks in the output sections?

@sage-wright
Copy link
Member

sage-wright commented Oct 24, 2024

Also I was wrong and you do need to provide a default value for the percentage_mapped_reads in case the read_screen fails (see here).

Could you coerce all of these non-optional outputs into Strings and provide "" as the default value at the end of the select_first for the workflows? That way, when the read_screen fails, there isn't a 0 populated to the column since that would be an inaccurate statement.

  • in ClearLabs, remove the ? quantifier off the float
  • in SE, change the Float? to a String?
  • in PE, change the Float to a String and add a , "" to the end of the select_first block
  • in ONT, change the Float to a String and add a , "" to the end of the select_first block
  • in ivarconsensus, change change it to be a String and add , "" to the end of the select_first (like we do for meanbaseq and meanmapq, assembly_mean_coverage, etc.

Thanks!

@sage-wright
Copy link
Member

just as a side note, do you have any examples where it's not 100%?

@fraser-combe
Copy link
Contributor Author

just as a side note, do you have any examples where it's not 100%?

No Id have to try and find data, unless you have any potential data that may provide this kind of result

@fraser-combe
Copy link
Contributor Author

Those updates should be passing tests now, let me know if you need me to find more testing data after your tests have completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add percentage_mapped_reads output to TheiaCoV wfs
2 participants