https://hydra-genetics-cnv-sv.readthedocs.io
Snakemake module containing steps to call copy number variants and structural variants
The module contain rules used to call CNVs (copy number variants) and SV (structural variants), also a rule used to merge CNV/SV
In order to use this module, the following dependencies are required:
Input data should be added to samples.tsv
and units.tsv
.
The following information need to be added to these files:
Column Id | Description |
---|---|
samples.tsv |
|
sample | unique sample/patient id, one per row |
tumor_content | ratio of tumor cells to total cells |
units.tsv |
|
sample | same sample/patient id as in samples.tsv |
type | data type identifier (one letter), can be one of Tumor, Normal, RNA |
platform | type of sequencing platform, e.g. NovaSeq |
machine | specific machine id, e.g. NovaSeq instruments have @Axxxxx |
flowcell | identifer of flowcell used |
lane | flowcell lane number |
barcode | sequence library barcode/index, connect forward and reverse indices by + , e.g. ATGC+ATGC |
fastq1/2 | absolute path to forward and reverse reads |
adapter | adapter sequences to be trimmed, separated by comma |
A reference .fasta-file should be specified in config.yaml in the section reference and fasta. In addition, the file should be indexed using samtools faidx and the path of the resulting file added to the stanza fai. A bed file containing the covered regions shall be added to design_bed.
PoN must be configured to be able to run CNVkit and gatk CNV, workflows for this can be found at hydra-genetics/references. Instructions for the tools can be found at.
For exomdepth read count data must be generated. more information can be found at ExomeDepth repo
Could be a gnomeAD vcf file filtered on population allele frequences above 0.001
Purecn is used to estimate tumor purity from the data. Purecn can be run with different kinds of segmentation methods in conjunction with different variant files as input, see further purecn in the schemas.
The workflow repository contains a small test dataset .tests/integration
which can be run like so:
$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --configfile config.yaml --use-singularity
To use this module in your workflow, follow the description in the
snakemake docs.
Add the module to your Snakefile
like so:
module cns_sv:
snakefile:
github(
"hydra-genetics/cnv_sv",
path="workflow/Snakefile",
tag="v0.1.0",
)
config:
config
use rule * from cnv_sv as cnv_sv_*
Latest:
- alignment:v0.4.0
See COMPATIBLITY.md file for a complete list of module compatibility.
The following output files should be targeted via another rule:
File | Description |
---|---|
cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.vcf |
vcf with merged CNV and SV |
cnv_sv/{caller}_vcf/{sample}_{type}.{tc_method}.vcf |
vcf file for each caller |
cnv_sv/exomedepth_call/{sample}_{type}.txt |
CNV calls from exomedepth |
cnv_sv/pindel_vcf/{sample}_{type}.no_tc.vcf |
SV calls from pindel |
cnv_sv/tiddit/{sample}_{type}.vcf |
SV calls from tiddit |
cnv_sv/cnvpytor/{sample}_{type}.vcf |
SV calls from cnvpyter |
cnv_sv/manta_run_workflow_t/{sample}/results/variants/tumorSV.vcf.gz |
vcf file with CNV and SV calls from Manta |
cnv_sv/manta_run_workflow_tn/{sample}/results/variants/somaticSV.vcf.gz |
vcf file with CNV and SV calls from Manta |
cnv_sv/manta_run_workflow_n/{sample}/results/variants/candidateSV.vcf.gz |
vcf file with CNV and SV calls from Manta |
cnv_sv/smn_caller/{sample}_{type}.tsv |
cnv calling in the SMN gene with smncopynumbercaller |
cnv_sv/expansionhunter/{sample}_{type}.vcf |
vcf file with repeat expansions from expansionhunter |
cnv_sv/reviewer/{sample}_{type}/*.svg |
direcotory of svg files of read pileups from reviewer |
cnv_sv/automap/{sample}_{type}/{sample}_{type}.vcf |
vcf file with regions of homozygosity (ROHs) from automap |
cnv_sv/purecn_purity_file/{sample}_{type}.purity.txt |
text file with estimated purity from purecn |
cnv_sv/upd/{sample}_{type}.upd_regions.bed |
bed file of upd regions |
cnv_sv/upd/{sample}_{type}.upd_sites.bed |
bed file of upd informative sites |
Since it is not possible to create integration test without a full dataset purecn will not be subjected to integration testing and we can not guarantee that it will work.
Since it is not possible to create integration test without a large dataset SMNCopyNumberCaller will not be subjected to integration testing and we can not guarantee that it will work