-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate the use of cg_pipeline for nanoplot stats #164
Conversation
Nanoplot to estimate coverage (similar to cg_pipeline) TODO:
|
Remove redundancy of fastq_scan |
* make mash genome size estimation optional * float contigs_gfa to table * rename gfa * add skip_mash to theiacov_ont * update md5sum --------- Co-authored-by: cimendes <[email protected]>
…_health_bioinformatics into im-theiaprok-ont
TheiaProk Testing44 sample dataset run with main branch: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/17fd616b-cf16-453c-9280-4f7a594c8e67 44 sample dataset run with commit id 3129921: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/d4ef8d7d-fd99-4ef8-8400-31c1949593af Deprecated columns:
New columns:
|
TheiaCoV Testing72 sample dataset with main branch: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/fe417b76-f79a-4004-aab4-5c792d44ffd9 72 sample dataset with commit id d367b10: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/16e494be-0547-4403-b187-2398b10e8fc4 Deprecated columns:
New columns:
|
With the exception of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work team!
Partially closes # 158
🛠️ Changes Being Made
This PR removes cg_pipeline as the tool for read statistics calculation (mean quality score, among others) and instead replaces it by nanoplot stats.
The following fields were removed from the outputs in TheiaProk:
cg_pipeline_report_raw
cg_pipeline_docker
cg_pipeline_report_clean
number_raw_reads
number_clean_reads
fastq_scan_version
fastq_scan_report
The following fields were introduced in the datatable in TheiaProk:
nanoplot_version
nanoplot_docker
nanoplot_html_raw
nanoplot_tsv_raw
nanoplot_num_reads_raw1
nanoplot_r1_mean_readlength_raw
nanoplot_r1_mean_q_raw
nanoplot_html_clean
nanoplot_tsv_clean
nanoplot_num_reads_clean1
nanoplot_r1_mean_readlength_clean
nanoplot_r1_mean_q_clean
This PR also brings TheiaCoV_ONT and TheiaProk_ONT into alignment regarding workflow logic and tools used.
🧠 Context and Rationale
From @emmadoughty
📋 Workflow/Task Steps
Nanoplot performs statistical analysis of read length distribution and read quality. It now completely replaces cg_pipeline on both raw and clean read quality assessments. The nanoplot task was shown some love by exposing docker, memory and CPU arguments, and exposing docker as output.
Inputs
N/A
Outputs
The following fields were introduced in both TheiaCoV and TheiaProk:
nanoplot_version
nanoplot_docker
nanoplot_html_raw
nanoplot_tsv_raw
(nanoplot_)num_reads_raw1
(nanoplot_)r1_mean_readlength_raw
(nanoplot_)r1_mean_q_raw
nanoplot_html_clean
nanoplot_tsv_clean
(nanoplot_)num_reads_clean1
(nanoplot_)r1_mean_readlength_clean
(nanoplot_)r1_mean_q_clean
For the above variables in TheiaCoV, the
nanoplot_
prefix was not included as to not alter existing output names for users.In TheiaCoV only:
est_coverage_raw
est_coverage_clean
🧪 Testing
Locally
All tests passed locally
Terra
Tested on 44 ONT data samples on terra: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/f6f822af-6a5d-4efc-a8bb-e118fcadc756TheiaProk_ONT: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Doughty_Sandbox/job_history/526ac0e1-78fa-4cd3-916a-e904ae716865TheiaProk_ONT: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Doughty_Sandbox/job_history/8cfb3210-55d2-4a40-8966-0e895075f5b2
TheiaCoV_ONT: https://app.terra.bio/#workspaces/cdc-terra-resources/Theiagen_Wright_SC2_Sandbox/job_history/ed0f7349-5f21-4721-a106-0e0281971de0
🔬 Quality checks
Pull Request (PR) checklist: