Skip to content
This repository has been archived by the owner on May 13, 2020. It is now read-only.

Latest commit

 

History

History
71 lines (57 loc) · 3.43 KB

README.md

File metadata and controls

71 lines (57 loc) · 3.43 KB

gatk4-pathseq

Purpose :

This repo contains workflows for computational pathogen discovery using PathSeq, a pipeline in the Genome Analysis Toolkit (GATK) for detecting microbial organisms in short-read deep sequencing samples taken from a host organism.

Additional Resources:

pathseq-pipeline

Runs the PathSeq pipeline

Requirements/expectations :

  • BAM
    • File must pass validation by ValidateSamFile
    • All reads must have an RG tag
    • One or more read groups all belong to a single sample (SM)
  • Host and microbe references files available in the GATK Resource Bundle

Output :

  • BAM file containing microbe-mapped reads and reads of unknown sequence
  • Tab-separated value (.tsv) file of taxonomic abundance scores
  • Picard-style metrics files for the filter and scoring phases of the pipeline

pathseq-build-microbe-reference

Builds a microbe reference for use with PathSeq

Requirements/expectations :

  • FASTA file containing microbe sequences from NCBI RefSeq

Output :

  • FASTA index and dictionary files
  • GATK BWA-MEM index image
  • PathSeq taxonomy file

pathseq-build-host-reference

Builds a host reference for use with PathSeq

Requirements/expectations :

  • FASTA file containing host sequences

Output :

  • FASTA index and dictionary files
  • GATK BWA-MEM index image
  • PathSeq Kmer file

Software version notes

  • GATK 4 or later
  • Cromwell version support
    • Successfully tested on v36
    • Does not work on versions < v23 due to output syntax

Important Notes :

Contact Us :

  • The following material is provided by the Data Science Platforum group at the Broad Institute. Please direct any questions or concerns to one of our forum sites : GATK or Terra.

LICENSING

Copyright Broad Institute, 2018 | BSD-3

This script is released under the WDL source code license (BSD-3) (see LICENSE in https://github.com/broadinstitute/wdl). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script.