mapquik
is an ultra-fast read mapper based on
Pre-requisites: a working Rust environment (https://rustup.rs/).
Clone the repository, and run
rustup install nightly
cargo +nightly build --release
The nightly version of cargo
is required because mapquik
uses experimental language features (such as SIMD and intrinsics).
target/release/mapquik <reads.fq> --reference <reference.fa>
mapquik
takes a single FASTA/FASTQ input (gzip
-compressed or not) as input. Multi-line sequences, and sequences with lowercase characters, are not supported.
The output of mapquik
is a regular PAF file.
An example reference genome, and a script to simulate reads using pbsim
are provided in the example/
folder. To run mapquik
on a small set of 100 reads, type:
cd example && bash run_ecoli.sh
which will run both mapquik
and minimap2
on 100 simulated reads, and return the output of paftools.js mapeval
on both PAF files.
To simulate a larger set of reads using pbsim and map, type:
bash simulate_pbsim.sh && bash run_ecoli_full.sh
For further information on usage and parameters, run
target/release/mapquik -h
for a one-line summary of each flag, or run
target/release/mapquik --help
for a lengthy explanation of each flag.
All scripts used to generate the figures and tables in the paper can be found in the experiments/
folder. Specifically, the simulate_chm13.sh
and simulate_maize.sh
scripts can be used similarly to simulate reads.
In order to obtain and map DeepConsensus reads, first run
wget https://storage.googleapis.com/brain-genomics-public/research/deepconsensus/data/v0.3/assembly_analysis/fastqs/HG002_24kb_2SMRT_cells.dc.v0.3.q20.fastq.gz
gunzip -c HG002_24kb_2SMRT_cells.dc.v0.3.q20.fastq.gz | grep -v TOTAL > dc.hg002.fastq
and map to a reference genome reference.fa
in your directory with mapquik
using
target/release/mapquik dc.hg002.fastq --reference reference.fa -p mapquik-dc
mapquik
is freely available under the MIT License.
- Barış Ekim, supervised by Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT)
- Rayan Chikhi at the Department of Computational Biology at Institut Pasteur
mapquik
is not yet published. For now, please cite our original mdBG article: Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer (2021).
@article {mdbg,
author = {Ekim, Bar{\i}{\c s} and Berger, Bonnie and Chikhi, Rayan},
title = {Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer},
year = {2021},
doi = {10.1016/j.cels.2021.08.009},
journal = {Cell Systems}
volume={12},
number={10},
pages={958--968},
year={2021},
publisher={Elsevier}
}
Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu, or Rayan Chikhi at rchikhi [at] pasteur [dot] fr.