get Multi-Nucleotide Variants

Paula Ruiz-Rodriguez¹ and Mireia Coscolla¹
_{1. Institute for Integrative Systems Biology, I²SysBio, University of Valencia-CSIC, Valencia, Spain}

get Multi-Nucleotide Variants

get_MNV is a tool designed to identify Multi-Nucleotide Variants (MNVs) within the same codon in genomic sequences. MNVs occur when multiple Single Nucleotide Variants (SNVs) are present within the same codon, leading to the translation of a different amino acid. This tool addresses limitations in current annotation programs like ANNOVAR or SnpEff, which are primarily designed to work with individual SNVs and might overlook the actual amino acid changes resulting from MNVs.

get_MNV seeks to address this issue, enhancing the comprehensiveness of genetic variant interpretation.

IMPORTANT this script works with SNV against a reference, insertions and deletions modifiying reading frame are not currently supported

💾 Features

MNV Identification: Detects SNVs occurring within the same codon and reclassifies them as MNVs.
Accurate Amino Acid Change Calculation: Computes the resulting amino acid changes based on genomic reads.
Integration with BAM and VCF Files: Supports input from VCF files for variants and optional BAM files for aligned reads.
Quality Analysis: Allows setting a minimum Phred quality threshold to filter out low-quality reads.

🛠️ Installation

You can install get_MNV via conda, mamba (for unix/mac) or downloading the binary file (unix):

🐍 Using conda

conda install -c bioconda get_mnv

🐍 Using mamba

mamba install -c bioconda get_mnv

📨 Using binary

wget https://github.com/PathoGenOmics-Lab/get_MNV/releases/download/1.0.0/get_mnv

📎 Usage

get_mnv [OPTIONS] --vcf <VCF_FILE> --fasta <FASTA_FILE> --genes <GENES_FILE>

🗃️ Options:

-v, --vcf <VCF_FILE>: VCF file containing the SNVs. (Required)
-b, --bam <BAM_FILE>: BAM file with aligned reads. (Optional)
-f, --fasta <FASTA_FILE>: FASTA file with the reference sequence. (Required)
-g, --genes <GENES_FILE>: File containing gene information. (Required)
-q, --quality : Minimum Phred quality score (default: 20).

Example:

get_mnv \
  --vcf variants.vcf \
  --bam reads.bam \
  --fasta reference.fasta \
  --genes genes.txt \
  --quality 30

Input File Formats

VCF File: Should contain the identified SNVs.
BAM File: (Optional) Genomic reads aligned to the reference sequence.
FASTA File: Reference genomic sequence.
Gene File: A tab-delimited text file with the following structure per line (GeneName,GeneStart,GeneEnd,Strand):

Rv0007_Rv0007	9914	10828	+
ileT_Rvnt01	10887	10960	+
alaT_Rvnt02	11112	11184	+
Rv0008c_Rv0008c	11874	12311	-
ppiA_Rv0009	12468	13016	+
Rv0010c_Rv0010c	13133	13558	-

🎴Output

The program generates a TSV file named <vcf_filename>.MNV.tsv containing the following information:

Gene: Name of the gene.
Positions: Positions of the variants.
Base Changes: Nucleotide base changes.
AA Changes: Resulting amino acid changes.
SNP AA Changes: Amino acid changes if considering individual SNVs.
Variant Type: Type of variant (SNP, MNV, or SNP/MNV).
Change Type: Type of change at the protein level (Synonymous, Non-synonymous, Stop gained).
SNP Reads: (If BAM provided) Count of reads supporting each SNP.
MNV Reads: (If BAM provided) Count of reads supporting the MNV.

Example:

Gene	Positions	Base Changes	AA Changes	SNP AA Changes	Variant Type	Change Type	SNP Reads	MNV Reads
Rv0095c_Rv0095c	104838	T	Asp126Glu	Asp126Glu	SNP	Non-synonymous	0	16
Rv0095c_Rv0095c	104941,104942	T,G	Gly92Gln	Gly92Glu; Gly92Arg	MNV	Non-synonymous	0,0	25
esxL_Rv1198	1341044	C	His13His	His13His	SNP	Synonymous	0	41
esxL_Rv1198	1341083	G	Ala26Ala	Ala26Ala	SNP	Synonymous	0	12
esxL_Rv1198	1341102,1341103	T,C	Arg33Ser	Arg33Cys; Arg33Pro	MNV	Non-synonymous	0,0	11

📉 Limitations

The script currently works only with SNVs compared against a reference sequence.
Insertions and deletions that modify the reading frame are not supported in this version.

✨ Contributors

get_MNV is developed with ❤️ by:

_{Paula Ruiz-Rodriguez}
💻 🔬 🤔 🔣 🎨 🔧

_{Mireia Coscolla}
🔍 🤔 🧑‍🏫 🔬 📓

This project follows the all-contributors specification (emoji key).

Fun

3D model logo

Click for the stl file

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.github		.github
example		example
images		images
src		src
CITATION.cff		CITATION.cff
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

get Multi-Nucleotide Variants

💾 Features

🛠️ Installation

🐍 Using conda

🐍 Using mamba

📨 Using binary

📎 Usage

🗃️ Options:

Example:

Input File Formats

🎴Output

📉 Limitations

✨ Contributors

Fun

3D model logo

About

Releases 1

Packages

Languages

License

PathoGenOmics-Lab/get_MNV

Folders and files

Latest commit

History

Repository files navigation

get Multi-Nucleotide Variants

💾 Features

🛠️ Installation

🐍 Using conda

🐍 Using mamba

📨 Using binary

📎 Usage

🗃️ Options:

Example:

Input File Formats

🎴Output

📉 Limitations

✨ Contributors

Fun

3D model logo

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages