A tool for the analysis of reference mismatches in high throughput sequencing data from DNA samples. Unlike other tools, it is able to evalutate the portions of reads that overlap with specified regions (e.g. Repeats)
conda: conda install -c bioconda tasmanian-mismatch
pip: pip install tasmanian-mismatch
The main goal is to identify systematic missmatches that might confound SNPs or other variations that should or should not be associated to biological outcomes. Since we noticed a set of regions, which might not necessarily be missplaced in the reference genome, have dramatic effects in this analysis, we provide a way of spliting these reads and incorporate the information in different tables, so that intersecting/non intersecting reads are not filtered out. Also, the researcher has a more accurate picture of the influence of these regions in the observed artifacts.
samtools view bam | run_intersections [OPTIONS] | run_tasmanian [OPTIONS]
- Classification of each base of the read into overlapping (in which case could be contained or boundary - see figure below) or Non-overlapping with regions of interest included in a bed/bedgraph file.
- Positional analysis of artifacts splitted by read 1 and read 2.
The output includes tables to manupulate and plot the data and a built in report for fast access the data (see figure below).
Contributions are welcome and encouraged.
tasmanian artifact metrics tool
is open source software released under the GNU License.