-
Notifications
You must be signed in to change notification settings - Fork 52
Installation & Testing
NGS-Pipe is a pipeline for the core analysis of DNA and RNA sequencing samples generated in the context of precision oncology. One of the main design goals is to provide an easy to use and robust toolkit for users with bioinformatic expertise. As any other pipeline, NGS-pipe relies on the underlying software that has to be installed before analysis can be performed. In this section we describe different options how to install the bioinformatic software needed for analysis as well as show how the pipeline can be executed with example data that we provide.
The pipeline comprises a large number of software tools, spanning from aligners, to quality control tools to variant callers. We believe that there are currently 2 viable options to install/provide the tools on your environment.
Conda is a package manager that automatically installs software and encapsulates it into an environment. Since a large number of bioinformatic software is available via the bioconda channel that cover the majority of the tools needed in the NGS-pipe we provide conda scripts for DNA and RNA. We recommend the use of conda for the installation.
Installation of tools by hand is also possible but also cumbersome. You will be in charge to find software in the correct version and install it on your own system. Snakemake will require to adjust the path in the config files.
We have decided not to integrate our pipeline into docker. Docker is a neat tool to package your software and its dependencies into a simple container. But there are multiple flaws when it comes to executing Docker containers on a HPC environment, such as privilege escalation or performance. The flaws can be fixed e.g. by "translating" the container to Singularity. But in total, the overhead to make a Docker container HPC-ready is similar to the installation of tools by hand and bare metal.
A large fraction of tools required by the NGS-Pipe is covered by Conda and the bioconda channel. Installation of tools is performed by a single command.
All tools required for the analysis of RNASeq experiments are provided by conda. The tools will be installed via conda and the environment activated.
#The RNA environment (environments/rna_environment.yaml)
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- fastqc=0.11.5
- samtools=1.2
- star=2.5.3a
- trimmomatic=0.36
- subread=1.5.2
- snakemake=3.13.3
#Install tools from rna-environment.yaml
conda env create -n ngs-pipe-rna --file environments/rna_environment.yaml
#Activate environment
conda activate ngs-pipe-rna
After the environment is activated all tools are available via commandline and ready to be executed in the pipeline.
We provide data and a test script to get familiar with how the raw data has to be formatted and how to execute the pipeline.
#1. Go to examples folder:
cd examples/rna
#2. Download test data: We provide an additional snakemake pipeline to
# download test sequences, databases and adapter files:
./run_prepare_data_locally.sh
# This will download 8 test data sets, the adapters, the human reference
# and build the STAR database index
#3. Execute the RNASeq Pipeline:
./run_analysis_locally.sh
# This will execute: RAW-->Trimmomatic-->STAR-->FeatureCounts
All core tools required for the analysis of DNA sequencing experiments are provided by conda. These tools will be installed via conda and the environment activated. However some tools are not provided by conda and need to be installed by hand (see list below).
#The DNA environment (environments/dna_environment.yaml).
#The disabled dependencies are not needed for the example data and can be enabled when needed
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- snakemake=3.13.3
- fastqc=0.11.5
- samtools=1.4
- trimmomatic=0.36
- bwa=0.7.15
- picard=2.9.2
- gatk=3.5
- varscan=2.4.2
- qualimap=2.2
- sra-tools=2.8.1
#- bowtie2=2.3.2
#- yara=0.9.6
#- snpeff=4.3
#- snpsift=4.3
#- freebayes=1.1.0
#- somatic-sniper=1.0.5.0
#- pindel=0.2.5b8
#- bioconductor-deepsnv=1.20.0
#- vardict-java=1.4.10
#- vardict=2017.04.18
#Install tools from dna-environment.yaml
conda env create -n ngs-pipe-dna --file environments/dna_environment.yaml
#Activate environment
conda activate ngs-pipe-dna
After the environment is activated all tools are available via command line and ready to be executed in the pipeline.
We provide data and a test script to get familiar with how the raw data has to be formatted and how to execute the pipeline. However, this test script doesn't execute the full pipeline but only a subset due to limitations of tools installable by conda. The full pipeline can be executed once all required tools are installed.
#1. Go to examples folder:
cd examples/dna
#2. Download test data: We provide an additional snakemake pipeline to
# download test sequences, databases and adapter files:
./run_prepare_data_locally.sh
# This will download 6 test data sets, the adapters, regions file,
# the human reference and build the BWA database index
#3. Execute the DNA Pipeline:
./run_analysis_locally.sh
# This will execute: RAW --> QC(Trimmomatic) --> Mapping(BWA) --> Sort(Picard)
# --> Merge(Picard) --> Remove Secondary Alignments(Samtools) --> MarkDuplicates(Picard)
# --> RemoveDuplicates(Samtools) --> SNV Calling (VarScan2)
Tool | Optional/Mandatory | Comment | Version |
---|---|---|---|
GATK | Mandatory | The jar needs to be registered by conda | 3.5 |
JointSNVMix | Optional | 0.75 | |
JointSNVMix2 | Optional | current | |
Seqpurge | Optional | current | |
mutect | Optional | current | |
dindel | Optional | current | |
rankCombineVariants | Optional | current | |
bicseq2 | Optional | current | |
annovar | Optional | current | |
facets | Optional | current | |
somaticseq | Optional | v2.1.2 | |
strelka | Optional | current |