Skip to content

Commit

Permalink
prep for release
Browse files Browse the repository at this point in the history
  • Loading branch information
brentp committed Jan 20, 2022
1 parent 625a6f6 commit 8bd6464
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 8 deletions.
17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
Echtvar enables rapid annotation of variants with huge pupulation datasets and
it supports filtering on those values. It chunks the genome into 1<<20 (~1 million
) bases, [encodes each variant into a 32 bit integer](https://github.com/brentp/echtvar/blob/02774b8d1cd3703b65bd2c8d7aab93af05b7940f/src/lib/var32.rs#L9-L21) (with a [supplemental table](https://github.com/brentp/echtvar/blob/02774b8d1cd3703b65bd2c8d7aab93af05b7940f/src/lib/var32.rs#L33-L38)
for those that can't fit due to large REF and/or ALT alleles). It uses [delta
for those that can't fit due to large REF and/or ALT alleles). It uses the zip format, [delta
encoding](https://en.wikipedia.org/wiki/Delta_encoding)
and [integer compression
](https://lemire.me/blog/2017/09/27/stream-vbyte-breaking-new-speed-records-for-integer-compression/)
to create a compact and searchable format of any integer or float columns
selected from the population file.

Once created, an echtvar file can be used to annotate variants in a VCF (or
Once created, an echtvar (zip) file can be used to annotate variants in a VCF (or
BCF) file at a rate of >1 million variants per second (most of the time is spent
reading and writing VCF/BCF, so this number depends on the particular file).

Expand All @@ -29,7 +29,7 @@ and then the file can be re-used for the `annotate` step with each new query fil
```
echtvar \
encode \
echtvar-gnomad-v3.zip \
gnomad.v3.1.2.echtvar.zip \
conf.json # this defines the columns to pull from $input_vcf, and how to
$input_population_vcf[s] \ can be split by chromosome or all in a single file.
name and encode them
Expand All @@ -39,14 +39,19 @@ name and encode them
See below for a description of the json file that defines which columns are
pulled from the population VCF.

> you can get a pre-made 6.8GB echtvar file from gnomad v3.1.2 (hg38 whole genomes) with this command:
> ```
> curl -L -o gnomad.v3.1.2.echtvar.zip https://surfdrive.surf.nl/files/index.php/s/O4mehMM7b3cmK9s/download
> ```
annotate a VCF with an echtvar file and only output variants where `gnomad_af`
from the echtvar file is < 0.01.

```
echtvar annotate \
-o $cohort.echtvar-annotated.bcf \
-a gnomad.echtvar \
-i 'gnomad_af < 0.01' \
-o $cohort.echtvar-annotated.filtered.bcf \
-a gnomad.v3.1.2.echtvar.zip \
-i 'gnomad_popmax_af < 0.01' \
$cohort.input.bcf
```

Expand Down
18 changes: 18 additions & 0 deletions examples/gnomad.v3.1.2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
// this it the file used to create the files linked from echtvar releases.
// when using that echtvar file, each of the "aliases" listed here will be added
// to the output file.
[
{"field": "AC", "alias": "gnomad_ac"},
{"field": "AN", "alias": "gnomad_an"},
{"field": "nhomalt", "alias": "gnomad_nhomalt"},
{"field": "AF", "alias": "gnomad_af", multiplier: 2000000},

{"field": "AC_popmax", "alias": "gnomad_popmax_ac"},
{"field": "AN_popmax", "alias": "gnomad_popmax_an"},
{"field": "nhomalt_popmax", "alias": "gnomad_popmax_nhomalt"},
{"field": "AF_popmax", "alias": "gnomad_popmax_af", multiplier: 2000000},

{"field": "AF_controls_and_biobanks", "alias": "gnomad_controls_and_biobanks_af", multiplier: 2000000},
{"field": "nhomalt_controls_and_biobanks", "alias": "gnomad_controls_and_biobanks_nhomalt"},
]

4 changes: 2 additions & 2 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ extern crate fasteval;
use commands::{annotate_cmd, encoder_cmd};
use std::error::Error;

const VERSION: &str = "0.0.1";
const VERSION: &str = env!("CARGO_PKG_VERSION");

fn main() -> Result<(), Box<dyn Error>> {
let mut app = clap_app!(echtvar =>
(version: VERSION)
(author: "Brent Pedersen <[email protected]")
(about:"variant encoding and annotation")
(about: "variant encoding and annotation")
(@subcommand encode =>
(about: "create an echtvar file from a population VCF/BCF")
(@arg OUTPUT: +required "output zip file")
Expand Down

0 comments on commit 8bd6464

Please sign in to comment.