Remove gene-specific primer sequences from SAM/BAM alignments of PCR amplicons by soft-clipping
Download latest version in a ZIP package
- SAMtools (at least version 1.3.1)
- GNU Parallel (at least version 20130522)
bamclipper.sh
soft-clips gene-specific primers from BAM alignment file based on genomic coordinates of primer pairs in BEDPE format.
./bamclipper.sh -b BAM -p BEDPE [-n NTHREAD] [-s SAMTOOLS] [-g GNUPARALLEL] [-u UPSTREAM] [-d DOWNSTREAM]
Given a BAM file called NAME.bam, a new BAM file (NAME.primerclipped.bam) and its associated index (NAME.primerclipped.bam.bai) will be generated in the current working directory.
Notes: For the sake of performance and simplicity, soft-clipping is performed solely based on genomic coordinates without involving the underlying sequence. Reference sequence names and coordinates of BAM and BEDPE are assumed to be derived from identical reference sequences (e.g. hg19).
Required arguments
- -b FILE: indexed BAM alignment file
- -p FILE: BEDPE file of primer pair locations
Options
- -n INT: number of threads for clipprimer.pl (the workhorse Perl script of BAMClipper) and samtools sort [1]
- -s FILE: path to samtools executable [samtools]
- -g FILE: path to gnu parallel executable [parallel]
- -u INT: number of nucleotide upstream to 5' most nucleotide of primer (in addition to 5' most nucleotide of primer) for assigning alignments to primers based on the alignment starting position. [1]
- -d INT: number of nucleotide downstream to 5' most nucleotide of primer (in addition to 5' most nucleotide of primer) for assigning alignments to primers based on the alignment starting position. [5]
# Clip primers by BAMClipper
>./bamclipper.sh -b examples/SRR2075598.bam -p examples/trusight_myeloid.bedpe -n 4
# done!
# SRR2075598.primerclipped.bam and its index SRR2075598.primerclipped.bam.bai are generated.
# the new SRR2075598.primerclipped.bam should be identical to the provided example (compare checksum of alignments and ignore headers)
>samtools view SRR2075598.primerclipped.bam | md5sum
6a431457fd6e892646c17d1c3029c24e -
>samtools view examples/SRR2075598.primerclipped.bam | md5sum
6a431457fd6e892646c17d1c3029c24e -
# An example line of primer pair BEDPE file (an amplicon targeting ASXL1)
>grep 31022896 examples/trusight_myeloid.bedpe
chr20 31022896 31022921 chr20 31023096 31023123
Details of demo data:
- examples/SRR2075598.bam: original BWA-MEM v0.7.7 alignments of a TruSight Myeloid panel dataset (first 100 read pairs of SRR2075598) (reference: hg19)
- examples/SRR2075598.primerclipped.bam: primer-clipped alignments
- examples/trusight_myeloid.bedpe: primer locations derived from manufacturer's TruSight-Myeloid-Manifest.txt
Au CH, Ho DN, Kwong A, Chan TL and Ma ESK, 2017. BAMClipper: removing primers from alignments to minimize false-negative mutations in amplicon next-generation sequencing. Scientific Reports 7:1567 (doi:10.1038/s41598-017-01703-6)