Skip to content

File Types

Adam Novak edited this page May 27, 2022 · 15 revisions

Glossary of vg-related File Types

The vg ecosystem uses a lot of file formats. Some are new and not consistently used yet, and some are old and still required for some less-popular operations.

Some of these are described in more detail at Index Types.

Reference Formats

These formats store genome references that define spaces in which genomics can be done.

Name Description Extension Purpose Status Notes
VG Protobuf
GFA
HashGraph
PackedGraph
Memory-Mapped PackedGraph
ODGI (vg flavor)
VG JSON
Indexed VG Protobuf
FASTA

Read and Alignment Formats

These formats store short or long reads from DNA sequencing machines, and can describe how they fit into references.

Name Description Extension Purpose Status Notes
GAM Protobuf
GAF
Indexed GAM
GAM JSON
GAMP Protobuf
GAMP JSON
BAM
SAM
FASTQ

Sample Information Formats

These formats can describe individual people or other organisms and how their genomes fit into or differ from references.

Name Description Extension Purpose Status Notes
GBWT
GBZ
VCF
Pack File
Pileup Protobuf
Pileup JSON
Locus Protobuf
Locus JSON

Miscellaneous Formats

These formats store other kinds of information, or are precomputed indexes to speed up operations on other data.

Name Description Extension Purpose Status Notes
Distance Index (v1)
Distance Index (v2)
GCSA
Minimizer Index
BED
Snarl Protobuf
Snarl JSON
SnarlTraversal Protobuf
SnarlTraversal JSON
Node ID Translation
VG Protobuf Index
GAM Index
FASTA Index
BAM Index
Tabix VCF Index
Clone this wiki locally