hydra-genetics/cnv_sv

https://hydra-genetics-cnv-sv.readthedocs.io

hydra-genetics/cnv_sv

Snakemake module containing steps to call copy number variants and structural variants

💬 Introduction

The module contain rules used to call CNVs (copy number variants) and SV (structural variants), also a rule used to merge CNV/SV

❗ Dependencies

In order to use this module, the following dependencies are required:

🎒 Preparations

Sample and unit data

Input data should be added to samples.tsv and units.tsv. The following information need to be added to these files:

Column Id	Description
`samples.tsv`
sample	unique sample/patient id, one per row
tumor_content	ratio of tumor cells to total cells
`units.tsv`
sample	same sample/patient id as in `samples.tsv`
type	data type identifier (one letter), can be one of Tumor, Normal, RNA
platform	type of sequencing platform, e.g. `NovaSeq`
machine	specific machine id, e.g. NovaSeq instruments have `@Axxxxx`
flowcell	identifer of flowcell used
lane	flowcell lane number
barcode	sequence library barcode/index, connect forward and reverse indices by `+`, e.g. `ATGC+ATGC`
fastq1/2	absolute path to forward and reverse reads
adapter	adapter sequences to be trimmed, separated by comma

Reference data

A reference .fasta-file should be specified in config.yaml in the section reference and fasta. In addition, the file should be indexed using samtools faidx and the path of the resulting file added to the stanza fai. A bed file containing the covered regions shall be added to design_bed.

Panel of normals (PoN) and read count

PoN must be configured to be able to run CNVkit and gatk CNV, workflows for this can be found at hydra-genetics/references. Instructions for the tools can be found at.

For exomdepth read count data must be generated. more information can be found at ExomeDepth repo

VCF used to calculate variant allele frequencies

Could be a gnomeAD vcf file filtered on population allele frequences above 0.001

Purecn

Purecn is used to estimate tumor purity from the data. Purecn can be run with different kinds of segmentation methods in conjunction with different variant files as input, see further purecn in the schemas.

✅ Testing

The workflow repository contains a small test dataset .tests/integration which can be run like so:

$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --configfile config.yaml --use-singularity

🚀 Usage

To use this module in your workflow, follow the description in the snakemake docs. Add the module to your Snakefile like so:

module cns_sv:
    snakefile:
        github(
            "hydra-genetics/cnv_sv",
            path="workflow/Snakefile",
            tag="v0.1.0",
        )
    config:
        config


use rule * from cnv_sv as cnv_sv_*

Compatibility

Latest:

alignment:v0.4.0

See COMPATIBLITY.md file for a complete list of module compatibility.

Output files

The following output files should be targeted via another rule:

File	Description
`cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.vcf`	vcf with merged CNV and SV
`cnv_sv/{caller}_vcf/{sample}_{type}.{tc_method}.vcf`	vcf file for each caller
`cnv_sv/exomedepth_call/{sample}_{type}.txt`	CNV calls from exomedepth
`cnv_sv/pindel_vcf/{sample}_{type}.no_tc.vcf`	SV calls from pindel
`cnv_sv/tiddit/{sample}_{type}.vcf`	SV calls from tiddit
`cnv_sv/cnvpytor/{sample}_{type}.vcf`	SV calls from cnvpyter
`cnv_sv/manta_run_workflow_t/{sample}/results/variants/tumorSV.vcf.gz`	vcf file with CNV and SV calls from Manta
`cnv_sv/manta_run_workflow_tn/{sample}/results/variants/somaticSV.vcf.gz`	vcf file with CNV and SV calls from Manta
`cnv_sv/manta_run_workflow_n/{sample}/results/variants/candidateSV.vcf.gz`	vcf file with CNV and SV calls from Manta
`cnv_sv/smn_caller/{sample}_{type}.tsv`	cnv calling in the SMN gene with smncopynumbercaller
`cnv_sv/expansionhunter/{sample}_{type}.vcf`	vcf file with repeat expansions from expansionhunter
`cnv_sv/reviewer/{sample}_{type}/*.svg`	direcotory of svg files of read pileups from reviewer
`cnv_sv/automap/{sample}_{type}/{sample}_{type}.vcf`	vcf file with regions of homozygosity (ROHs) from automap
`cnv_sv/purecn_purity_file/{sample}_{type}.purity.txt`	text file with estimated purity from purecn
`cnv_sv/upd/{sample}_{type}.upd_regions.bed`	bed file of upd regions
`cnv_sv/upd/{sample}_{type}.upd_sites.bed`	bed file of upd informative sites

🧑‍⚖️ Rule Graph

Disclaimer

Since it is not possible to create integration test without a full dataset purecn will not be subjected to integration testing and we can not guarantee that it will work.

Since it is not possible to create integration test without a large dataset SMNCopyNumberCaller will not be subjected to integration testing and we can not guarantee that it will work

Name		Name	Last commit message	Last commit date
Latest commit History 700 Commits
.github		.github
.tests		.tests
config		config
docs		docs
workflow		workflow
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
COMPATIBLITY.md		COMPATIBLITY.md
LICENSE.md		LICENSE.md
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements.test.txt		requirements.test.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hydra-genetics/cnv_sv

💬 Introduction

❗ Dependencies

🎒 Preparations

Sample and unit data

Reference data

Panel of normals (PoN) and read count

VCF used to calculate variant allele frequencies

Purecn

✅ Testing

🚀 Usage

Compatibility

Output files

🧑‍⚖️ Rule Graph

Disclaimer

About

Releases 10

Packages

Contributors 9

Languages

License

hydra-genetics/cnv_sv

Folders and files

Latest commit

History

Repository files navigation

hydra-genetics/cnv_sv

💬 Introduction

❗ Dependencies

🎒 Preparations

Sample and unit data

Reference data

Panel of normals (PoN) and read count

VCF used to calculate variant allele frequencies

Purecn

✅ Testing

🚀 Usage

Compatibility

Output files

🧑‍⚖️ Rule Graph

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 9

Languages

Packages