This repository provides a tool to call copy number variations (CNVs) from Whole Exome Sequencing (WES) or targeted sequencing data using ExomeDepth
. The tool is containerised using Docker
to eliminate compatibility issues and allows for analysis of multiple samples simultaneously with a set of baseline samples.
Follow these steps to set up and run the pipeline using Docker
. While using Docker
is recommended, you can run it without Docker
.
You will need to have Docker
installed on your system. For installation instructions, visit Docker's official website at https://www.docker.com/.
Warning
While it is possible to run this without Docker, we strongly recommend against it due to potential compatibility issues with dependencies. If you still wish to proceed without Docker, you can find the necessary information here: without docker
To run this you will need to have Docker
installed on your system. Docker
is a platform that allows you to create, deploy, and run applications in containers. More information about Docker
can be found at https://www.docker.com/. Follow the instructions below to install Docker
if you have not already:
-
Open a Terminal: Use your Linux distribution's package manager to install
Docker
. The command varies depending on the distribution:Ubuntu/Debian:
sudo apt-get install docker-ce docker-ce-cli containerd.io
Fedora:
sudo dnf -y install docker-ce
CentOS:
sudo yum install docker-ce docker-ce-cli containerd.io
-
Start the Docker Service: Use
sudo systemctl start docker
to start the Docker service. -
Verify Installation: Check if Docker is installed correctly by typing
docker --version
in the terminal.
The first step is to pull the Docker image from Docker Hub. This ensures you have the latest version of the runexomedepth
image.
docker pull murphydaviducl/runexomedepth:latest
This command downloads the latest version of the runexomedepth
Docker image to your local machine.
You need to have specific directories on your host machine that will be mounted into the Docker container. These directories will be used for input data and output results.
For example:
/mnt/example
: Directory for general data./mnt/example-2
: Another directory for additional data./mnt/test_data
: A third directory for more data./data/working/
: Directory where output files will be stored.
Make sure these directories exist or replace them with paths that suit your environment.
Note
Mounting Folders in Docker
Mounting folders in Docker allows you to share a directory from your computer (the host) with a specific location inside a Docker container. This makes the files accessible to the software running inside the container.
Why mount a folder?
To allow the container to access, read, or modify files on your computer. By default, the container does not have access to your files unless you explicitly share them.
How to mount a folder
Use the -v
option in the Docker command to specify:
- The folder on your computer (host) that you want to share.
- The location inside the container where you want the folder to appear.
Use the following command to run the Docker container with the required directories mounted. This command will start an interactive shell inside the container.
docker run -v /host/data1:/container/data1 \
-v /host/data2:/container/data2 \
-v /host/test_data:/container/test_data \
-v /host/working_dir:/container/working_dir \
-it murphydaviducl/runexomedepth:latest /bin/bash
v /host/data1:/container/data1
: Maps/host/data1
from your machine to/container/data1
in the container.v /host/working_dir:/container/working_dir
: Maps/host/working_dir
on the host to/container/working_dir
in the container.it
: Runs the container interactively, allowing you to interact with the shell.murphydaviducl/runexomedepth:latest
: Specifies the Docker image./bin/bash
: Starts a Bash shell inside the container.
Change the directory to the location where your data and scripts are stored inside the container.
cd /data/
This command navigates to the /data/
directory, which was mounted from the host machine.
To run the analysis, you need:
- a set of
BAM
files for which to call CNVs - one sample perBAM
file - a set of
BAM
files to use as the baseline - one sample perBAM
file - indexed
BAM
files (BAI
) for all the aboveBAM
files - a
BED
file of the target region of your exome capture or targeted sequencing data. If this is not supplied hg19 will be used. - an annotation file (
GTF
/GFF
). This needs to match the build of your targets. If this is not supplied ensembl version 71 (hg19) will be used. These can be downloaded from Ensembl or GENCODE.
Here is the latest GENCODE annotation: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_46/gencode.v46.annotation.gtf.gz
Here is the latest ensembl annotation: https://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/Homo_sapiens.GRCh38.112.gtf.gz
It is advisable to do the indexing of the BAM files prior to running the pipeline. If the dates of the index files are older than the BAM files, ExomeDepth will throw an error. Indexing of BAM files can be done by:
samtools index input.bam # for a single sample or;
samtools index -M *.bam # multiple samples
The main script to call CNVs with is called ExomeDepth.R
. Make sure you have the following input data prior to running it:
Parameter | Description |
---|---|
--targets |
bed file with exon targets. This is optional and if none is given hg19 will be used |
--annotation |
GTF/GFF file with gene coordinates. This is optional and if none is given ensembl version 71 (hg19) will be used |
--test-samples |
TSV file with paths to the BAM files to call CNVs for, one per line |
--baseline-samples |
TSV file with paths to the BAM files used for the baseline, one per line |
--output-directory |
path to output directory |
Example of --test-samples
and --baseline-samples
required TSV files looks like this:
/path/to/test_sample1.bam
/path/to/test_sample2.bam
/path/to/test_sample3.bam
Example files can also be found here test_samples.tsv and here baseline_samples.tsv
Once you have the required input data, follow these steps to run the ExomeDepth.R
script:
Rscript ExomeDepth.R \
--targets /path/to/targets.bed \
--annotation /path/to/annotation.gff \
--test-samples test_samples.tsv \
--baseline-samples baseline_samples.tsv \
--output-directory /path/to/output_folder/
The output will be a CSV file with called CNVs, a stderr and a stdout log file per sample stored in the working directory.
working directory
|--- {sample}_CNV.csv # A CSV file with called CNVs per sample
|--- {sample}_stderr.log # stdout to a sample-specific output log file
|--- {sample}_stdout.log # stderr to a sample-specific output log file
If using this tool please cite the original ExomeDepth manuscript
Plagnol, Vincent, et al. "A robust model for read count data in exome sequencing experiments and implications for copy number variant calling." Bioinformatics 28.21 (2012). https://doi.org/10.1093/bioinformatics/bts526