v1.0.0
Public Health Bioinformatics v1.0.0 Release Notes
This major release offers stable and validated versions of Theiagen's Terra-accessible WDL workflows in a single repository.
About Public Health Bioinformatics
The Public Health Bioinformatics repository hosts bioinformatics workflows for characterization, epidemiology, and sharing of pathogen genomes. More information about these workflows is available via the Theiagen Public Resources Documentation.
Due to numerous code redundancies across the PHVG, PHBG, and Terra Utilities repositories, updating and maintaining these repositories became error-prone and time-consuming. By (1) consolidating these repositories, (2) implementing stricter organization, and (3) enforcing the style guide, the PHB repository is now easier to read, maintain, and modify.
Major changes
All workflows now include the suffix _PHB to differentiate them from their previous incarnations in the PHBG, PHVG, and Terra Utilities repositories. A PHB Dockstore collection has been made to host these workflows. When importing these workflows from Dockstore, please remember to import the version with the _PHB suffix.
New workflows
Several new workflows have been created (please see the linked documentation for more information):
- Augur_PHB (and Augur_Prep_PHB): these workflows perform phylogenetic inference using Nextstrain's Augur pipeline. However, unlike the PHVG versions (TheiaCoV_Augur_Prep, TheiaCoV_Augur_DistanceTree, and TheiaCoV_Augur_Run) which were restricted to SARS-CoV-2, the PHB versions are now able to be run on non-SARS-CoV-2 viral pathogens, e.g., West Nile virus or mpox.
- TheiaProk_FASTA_PHB and TheiaProk_ONT_PHB: these workflows extend the TheiaProk workflow series to accept assemblies and Oxford Nanopore read data as input.
- Assembly_Fetch_PHB: this workflow downloads a reference assembly from NCBI from either (1) a provided assembly accession number, or (2) the closest identified reference genome to a query assembly.
- Snippy_Variants_PHB and Snippy_Tree_PHB: these workflows use Snippy to identify variants (Snippy_Variants) and use those variants to produce a phylogenetic tree (Snippy_Tree)
- Snippy_Streamline_PHB: this workflow is an all-in-one approach to generating a reference-based phylogeny using the Snippy tools. By default, it runs Snippy_Variants and Snippy_Tree, but will optionally run Assembly_Fetch if a reference genome is not provided.
- Lyve_Set_PHB: this workflow runs the Lyve-SET pipeline developed by Lee Katz.
- TheiaValidate_PHB: this workflow performs basic comparisons between user-designated columns in two separate tables. Intended to determine if any differences exist between version releases or two workflows, a summary PDF is produced in addition to an Excel spreadsheet that lists the values for any columns that do not have matching content for a sample.
Deprecated workflows
Several workflows will not be included in the PHB repository and be excluded from future development updates. However, these workflows will always be available in perpetuity in their origin repository.
- Mercury_PE_Prep, Mercury_SE_Prep, and Mercury_Batch (PHVG); the Mercury_Prep_N_Batch_PHB workflow offers similar functionality and capabilities
- TheiaCoV_WWVC (PHVG); the Freyja workflows are available for wastewater sequencing analysis
- TheiaCoV_Validate (PHVG); TheiaValidate_PHB workflow offers expanded capabilities
- TheiaCoV_Augur_Prep, TheiaCoV_Augur_DistanceTree, and TheiaCoV_Augur_Run (PHVG); the Augur_Prep_PHB and Augur_PHB workflows offer expanded capabilities
- Import_SE_reads, Import_PE_reads, BAM_to_FASTQ_SE, and BAM_to_FASTQ_PE (Terra Utilities)
- The Kleborate, SerotypeFinder, TBProfiler_Illumina_PE, and TBProfiler_ONT standalone workflows (PHBG)
Implementation of a style guide
To ensure consistency across the repository, a style guide was and continues to be implemented.
Documentation updates
The documentation for PHB v1.0.0 has been reorganized to help users identify what workflows may suit their needs. Documentation has been created for every workflow in the repository and includes lists of required and optional inputs, all potential outputs, details regarding the workflows, and tips for successful analysis and usage.
What's Changed
- v0.2.0 by @kevinlibuit , @michellescribner, @cimendes, @kapsakcj, @jrotieno, @rpetit3, @emmadoughty, @frankambrosio3
- add theiacov gha by @rpetit3 in #77
- Update default dockers by @sage-wright in #86
- Prevent assemblies & ONT data from Vibrio submodules by @sage-wright in #87
- make assembly_fasta optional by @sage-wright in #89
- allow not present columns in validation criteria to be ignored by @sage-wright in #90
- Quasitools Bug Discovery and Destruction by @sage-wright in #91
- fix path by @sage-wright in #93
Full Changelog: v0.2.0...v1.0.0