A fast structural variant caller for long-read sequencing, Sniffles2 accurately detect SVs on germline, somatic and population-level for PacBio and Oxford Nanopore read data.
To call SVs from long read alignments (PacBio / ONT), you can use:
sniffles -i mapped_input.bam -v output.vcf
For improved calling in repetitive regions, Sniffles2 accepts a tandem repeat annotations file using the option --tandem-repeats annotations.bed
. Sniffles2 compatible tandem repeat annotations for human references can be downloaded from the annotations/ folder.
(see sniffles --help or below for full usage information).
You can install Sniffles2 using pip or conda using:
pip install sniffles
or
conda install sniffles=2.5.2
If you previously installed Sniffles1 using conda and want to upgrade to Sniffles2, you can use:
conda update sniffles=2.5.2
- Python ==3.10.15
- pysam >=0.21.0
- edlib >=1.3.9
- psutil>=5.9.4
- python==3.10.12
- pysam==0.21.0
Please cite our paper at: Sniffles v2: https://www.nature.com/articles/s41587-023-02024-y
and Sniffles v1: https://www.nature.com/articles/s41592-018-0001-7
- To output deletion (DEL SV) sequences, the reference genome (.fasta) must be specified using e.g.
--reference reference.fasta
. - Sniffles2 supports optionally specifying tandem repeat region annotations (.bed), which can improve calling in these regions
--tandem-repeats annotations.bed
. Sniffles2 compatible tandem repeat annotations for human references can be found in the annotations/ folder. - Sniffles2 is fully parallelized and uses 4 threads by default. This value can be adapted using e.g.
--threads 4
as option. Memory requirements will increase with the number of threads used. - To output read names in SNF and VCF files, the
--output-rnames
option is required.
Multi-sample SV calling using Sniffles2 population mode works in two steps:
- Call SV candidates and create an associated .snf file for each sample:
sniffles --input sample1.bam --snf sample1.snf
- Combined calling using multiple .snf files into a single .vcf:
sniffles --input sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf
Alternatively, for step 2. you can supply a .tsv file, containing a list of .snf files, and custom sample ids in an optional second column (one sample per line), .e.g.:
2. Combined calling using a .tsv as sample list: sniffles --input snf_files_list.tsv --vcf multisample.vcf
To call mosaic SVs, the --mosaic option should be added, i.e.:
sniffles --input mapped_input.bam --vcf output.vcf --mosaic
Example command, to determine the genotype of each SV in input_known_svs.vcf for sample.bam and write the re-genotyped SVs to output_genotypes.vcf:
sniffles --input sample.bam --genotype-vcf input_known_svs.vcf --vcf output_genotypes.vcf
- .bam or .cram files containing long read alignments (i.e. from minimap2 or ngmlr) are supported as input
- .vcf.gz (bgzipped+tabix indexed) output is supported
- Simultaneous output of both .vcf and .snf file (for multi-sample calling) is supported
- We have developed a plotting tools for Sniffles2: https://github.com/farhangus/sniffle2_plot
- We also provide VCF and scripts used for the manuscript https://github.com/smolkmo/Sniffles2-Supplement
https://github.com/smolkmo/Sniffles2-Supplement/blob/main/Supplemetary%20tables.xlsx