Skip to content

Indexing Huge Datasets

Jouni Siren edited this page Dec 3, 2018 · 14 revisions

General

The index construction guidelines work best with datasets that are no larger than the 1000 Genomes Project (2500 human samples). This page describes techniques that may be useful when trying to index larger datasets.

GBWT construction for tens of thousands of samples

GCSA construction for a dense graph

Clone this wiki locally