https://hydra-genetics-snv-indels.readthedocs.io
The module contains rules to call variants from .bam
-files per chromosome, merging
the resulting .vcf
-files, fixing the allele frequency field followed by decomposing
and normalizing steps to finally combine the results from different callers using
an ensemble approach. Available callers are Mutect2,
Freebayes and VarDict and Haplotypecaller.
Mutect2 is also used to generate a genomic .vcf
-file.
In order to use this module, the following dependencies are required:
Note! Releases of snv_indels <= v0.2.0 needs tabulate<0.9.0 added in requirements.txt*
Input data should be added to samples.tsv
and units.tsv
.
The following information need to be added to these files:
Column Id | Description |
---|---|
samples.tsv |
|
sample | unique sample/patient id, one per row |
tumor_content | ratio of tumor cells to total cells |
units.tsv |
|
sample | same sample/patient id as in samples.tsv |
type | data type identifier (one letter), can be one of Tumor, Normal, RNA |
platform | type of sequencing platform, e.g. NovaSeq |
machine | specific machine id, e.g. NovaSeq instruments have @Axxxxx |
flowcell | identifer of flowcell used |
lane | flowcell lane number |
barcode | sequence library barcode/index, connect forward and reverse indices by + , e.g. ATGC+ATGC |
fastq1/2 | absolute path to forward and reverse reads |
adapter | adapter sequences to be trimmed, separated by comma |
A reference .fasta
-file should be specified in config.yaml
in the section reference
and fasta
.
In addition, the file should be indexed using samtools faidx
and the path of the resulting
file added to the stanza fai
. A bed file containing the covered regions shall be added
to design_bed
.
The workflow repository contains a small test dataset .tests/integration
which can be run like so:
$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --use-singularity
To use this module in your workflow, follow the description in the
snakemake docs.
Add the module to your Snakefile
like so:
module snv_indels:
snakefile:
github(
"hydra-genetics/snv_indels",
path="workflow/Snakefile",
tag="v0.1.0",
)
config:
config
use rule * from snv_indels as snv_indels_*
Latest:
- alignment:v0.5.1
See COMPATIBLITY.md file for a complete list of module compatibility.
The following output files should be targeted via another rule:
File | Description |
---|---|
snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vcf.gz |
combined .vcf generated by ensemble |
snv_indels/{caller}/{sample}_{type}.normalized.sorted.vcf.gz |
sorted .vcf.gz for each caller |
snv_indels/gatk_mutect2_gvcf/{sample}_{type}.merged.g.vcf.gz |
genomic .vcf |
snv_indels/deepvariant/{sample}_{type}.merged.vcf.gz |
deepvariant .vcf.gz |
snv_indels/deepvariant/{sample}_{type}.merged.g.vcf.gz |
genomic .g.vcf.gz for deepvariant |
snv_indels/glnexus/{sample}_{type}.vcf.gz |
trio .vcf.gz with proband sample id generated by glnexus from deeptrio genomic .g.vcf.gz |