Skip to content

Commit

Permalink
simplify readme
Browse files Browse the repository at this point in the history
  • Loading branch information
brentp committed Jun 13, 2022
1 parent 01d2b44 commit b05c6b8
Showing 1 changed file with 4 additions and 8 deletions.
12 changes: 4 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,20 @@
## Echtvar: Really, truly rapid variant annotation and filtering
[![Rust](https://github.com/brentp/echtvar/actions/workflows/ci.yml/badge.svg)](https://github.com/brentp/echtvar/actions/workflows/ci.yml)

Echtvar enables rapid annotation of variants with huge pupulation datasets and
it supports filtering on those values. It chunks the genome into 1<<20 (~1 million
) bases, [encodes each variant into a 32 bit integer](https://github.com/brentp/echtvar/blob/02774b8d1cd3703b65bd2c8d7aab93af05b7940f/src/lib/var32.rs#L9-L21) (with a [supplemental table](https://github.com/brentp/echtvar/blob/02774b8d1cd3703b65bd2c8d7aab93af05b7940f/src/lib/var32.rs#L33-L38)
Echtvar enables rapid variant annotation and filtering with huge pupulation datasets
It chunks the genome into 1<<20 (~1 million) bases,
[encodes each variant into a 32 bit integer](https://github.com/brentp/echtvar/blob/02774b8d1cd3703b65bd2c8d7aab93af05b7940f/src/lib/var32.rs#L9-L21) (with a [supplemental table](https://github.com/brentp/echtvar/blob/02774b8d1cd3703b65bd2c8d7aab93af05b7940f/src/lib/var32.rs#L33-L38)
for those that can't fit due to large REF and/or ALT alleles). It uses the zip format, [delta
encoding](https://en.wikipedia.org/wiki/Delta_encoding)
and [integer compression
](https://lemire.me/blog/2017/09/27/stream-vbyte-breaking-new-speed-records-for-integer-compression/)
to create a compact and searchable format of any integer, float, or low-cardinality string columns
selected from the population file.

Once created, an echtvar (zip) file can be used to annotate variants in a VCF (or
An echtvar (zip) file can be used to annotate and filter variants in a VCF (or
BCF) file at a rate of >1 million variants per second (most of the time is spent
reading and writing VCF/BCF, so this number depends on the particular file).

A filter expression can be applied so that only variants that meet the
expression are written. Since `echtvar` is so fast, writing the output is a bottleneck
so filtering can actually *increase* the speed.

read more at the [why of echtvar](https://github.com/brentp/echtvar/wiki/why)

### Getting started.
Expand Down

0 comments on commit b05c6b8

Please sign in to comment.