Skip to content
This repository has been archived by the owner on May 13, 2020. It is now read-only.

This repo is archived, the these workflows are still available in the GATK repository under the scripts directory. The workflows are also organized in Dockstore in the GATK Best Practices Workflows collection.

License

Notifications You must be signed in to change notification settings

gatk-workflows/gatk4-pathseq

Repository files navigation

gatk4-pathseq

Purpose :

This repo contains workflows for computational pathogen discovery using PathSeq, a pipeline in the Genome Analysis Toolkit (GATK) for detecting microbial organisms in short-read deep sequencing samples taken from a host organism.

Additional Resources:

pathseq-pipeline

Runs the PathSeq pipeline

Requirements/expectations :

  • BAM
    • File must pass validation by ValidateSamFile
    • All reads must have an RG tag
    • One or more read groups all belong to a single sample (SM)
  • Host and microbe references files available in the GATK Resource Bundle

Output :

  • BAM file containing microbe-mapped reads and reads of unknown sequence
  • Tab-separated value (.tsv) file of taxonomic abundance scores
  • Picard-style metrics files for the filter and scoring phases of the pipeline

pathseq-build-microbe-reference

Builds a microbe reference for use with PathSeq

Requirements/expectations :

  • FASTA file containing microbe sequences from NCBI RefSeq

Output :

  • FASTA index and dictionary files
  • GATK BWA-MEM index image
  • PathSeq taxonomy file

pathseq-build-host-reference

Builds a host reference for use with PathSeq

Requirements/expectations :

  • FASTA file containing host sequences

Output :

  • FASTA index and dictionary files
  • GATK BWA-MEM index image
  • PathSeq Kmer file

Software version notes

  • GATK 4 or later
  • Cromwell version support
    • Successfully tested on v36
    • Does not work on versions < v23 due to output syntax

Important Notes :

Contact Us :

  • The following material is provided by the Data Science Platforum group at the Broad Institute. Please direct any questions or concerns to one of our forum sites : GATK or Terra.

LICENSING

Copyright Broad Institute, 2018 | BSD-3

This script is released under the WDL source code license (BSD-3) (see LICENSE in https://github.com/broadinstitute/wdl). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script.

About

This repo is archived, the these workflows are still available in the GATK repository under the scripts directory. The workflows are also organized in Dockstore in the GATK Best Practices Workflows collection.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages