Skip to content

Commit

Permalink
Add metrics docs (#41)
Browse files Browse the repository at this point in the history
* Add metrics intro
  • Loading branch information
egor-dolzhenko authored Jan 14, 2022
1 parent d02c534 commit 7cd3be5
Show file tree
Hide file tree
Showing 5 changed files with 99 additions and 46 deletions.
76 changes: 30 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,22 @@ tandem repeats. REViewer requires a BAMlet with graph-realigned reads generated
by [ExpansionHunter](https://github.com/Illumina/ExpansionHunter) and the
corresponding variant catalog.

![Introductory example](docs/images/intro-example.png)

## License

REViewer is provided under the terms and conditions of the [GPLv3 license](LICENSE.txt).
It relies on several third party packages provided under other open source licenses,
please see [COPYRIGHT.txt](COPYRIGHT.txt) for additional details.

## Building
## Installation

The simplest way of obtaining REViewer is by downloading a Linux binary
corresponding to the latest release from the
[Releases page](https://github.com/Illumina/REViewer/releases). The link to the
binary is located in the *Assets* section.

REViewer can be built from source using the standard CMake commands.
REViewer can also be built from source with CMake.

```shell script
cd REViewer
Expand All @@ -38,56 +45,33 @@ REViewer \
--output-prefix <Prefix for the output files>
```

Note that the BAMlet generated by ExpansionHunter (`--reads` parameter) must be sorted and indexed.

## Examples of read pileups

[This document](docs/examples.md) describes read pileups corresponding to correctly and incorrectly
genotyped repeats. It is a good starting point for learning how to use the program.
Note that the BAMlet generated by ExpansionHunter (`--reads` parameter) must be
sorted and indexed.

## Companion tool
## Introductory guides

[reviewer #2](https://github.com/broadinstitute/reviewer2) is a companion tool
for REViewer developed by [Ben Weisburd](https://github.com/bw2) that provides a
convenient way to assess large quantities of read pileups.

## Overview of the method
- [A blog post describing the method](https://www.illumina.com/science/genomics-research/reviewer-visualizing-alignments-short-reads-long-repeat.html)
- [Examples of read pileups](docs/examples.md) corresponding to correctly and
incorrectly genotyped repeats

REViewer is designed to display alignments of reads generated by ExpansionHunter (Figure 3,
boxes 1-3). These alignments are obtained by realigning reads originating in the target region
to the corresponding sequence graph encoding one or more repeats located there5. REViewer then
constructs putative haplotype sequences using genotypes produced by ExpansionHunter and then
selects a pair of haplotypes that have the highest consistency with the read alignments (Figure 3,
boxes 4-6). (This step is skipped for repeats on haploid chromosomes.) Next, REViewer determines
the set of possible alignment positions for each read pair on each haplotype. For example, a read
pair originating within a flanking sequence shared by both haplotypes has exactly one alignment
position on each haplotype (Figure 3, Box 7a) while a read pair whose both mates are comprised
of the repeat sequence has multiple possible origins on haplotypes with sufficiently long repeats
(Figure 3, Box 7b). To generate a read pileup, REViewer selects one alignment position at random
for each read pair. This step is repeated a specified number of times (10,000 by default) to
generate multiple pileups. The pileup with the most even coverage of each allele is selected
for visualization (Figure 3, Box 8).
## Reference documentation

![Workflow outline](docs/images/workflow.png)
- [Overview of the method and its limitations](docs/method-overview.md)
- [Description of the quality metrics reported by REViewer](docs/metrics.md)

This algorithm is based on the idea that if a given locus is sequenced well and each constituent
repeat is genotyped correctly, then it is possible to distribute the reads to achieve an even
coverage of each haplotype. (Although many reads may not be assigned to the correct haplotype of
origin, especially in cases when the repeats are homozygous, and the resulting haplotypes are
identical.) Conversely, if the size of a repeat is significantly overestimated or underestimated,
no assignment of reads will result in an even pileup making the genotyping error easy to notice.
## Companion tools

## Limitations
- [FlipBook](https://github.com/broadinstitute/flipbook) is an image server for
REViewer developed by [Ben Weisburd](https://github.com/bw2). It provides a
convenient way to inspect large quantities of read pileups.

REViewer is a tool for assessing consistency of sequencing data with repeat genotypes produced by
ExpansionHunter. It provides a mechanism for reviewing the evidence supporting a genotype call in
clinical settings and to identify problematic corner cases to drive future development. The read
pileup plots generated by REViewer may contain inaccuracies: The repeats may not be phased correctly
(e.g., when repeats are located far apart from each other) and read pairs consistent with both
haplotypes will often be assigned to the incorrect haplotype. Also, the current version of REViewer
visualizes repeats whose span does not exceed the fragment length and longer repeats are capped at
the fragment length.
- [Review BAMs](https://gitlab.com/andreassh/review-bams) is script that allows
applying REViewer to a regular BAM file (by running ExpansionHunter in the
background). It was developed by [Andreas Halman](https://gitlab.com/andreassh).

## External links
## Citation

- [A blog post describing the method](https://www.illumina.com/science/genomics-research/reviewer-visualizing-alignments-short-reads-long-repeat.html)
- Dolzhenko E, Weisburd B, Garikano K, Rajan Babu IS, and colleagues,
[REViewer: Haplotype-resolved visualization of read alignments in and around
tandem repeats](https://www.biorxiv.org/content/10.1101/2021.10.20.465046v1),
bioRxiv, 2021
Binary file added docs/images/allele-depth-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/intro-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 43 additions & 0 deletions docs/method-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Overview of the method

REViewer is designed to display alignments of reads generated by ExpansionHunter
(Figure 3, boxes 1-3). These alignments are obtained by realigning reads
originating in the target region to the corresponding sequence graph encoding one
or more repeats located there5. REViewer then constructs putative haplotype
sequences using genotypes produced by ExpansionHunter and then selects a pair of
haplotypes that have the highest consistency with the read alignments (Figure 3,
boxes 4-6). (This step is skipped for repeats on haploid chromosomes.) Next,
REViewer determines the set of possible alignment positions for each read pair
on each haplotype. For example, a read pair originating within a flanking
sequence shared by both haplotypes has exactly one alignment position on each
haplotype (Figure 3, Box 7a) while a read pair whose both mates are comprised
of the repeat sequence has multiple possible origins on haplotypes with
sufficiently long repeats (Figure 3, Box 7b). To generate a read pileup,
REViewer selects one alignment position at random for each read pair. This step
is repeated a specified number of times (10,000 by default) to generate multiple
pileups. The pileup with the most even coverage of each allele is selected for
visualization (Figure 3, Box 8).

![Workflow outline](images/workflow.png)

This algorithm is based on the idea that if a given locus is sequenced well and
each constituent repeat is genotyped correctly, then it is possible to
distribute the reads to achieve an even coverage of each haplotype. (Although
many reads may not be assigned to the correct haplotype of origin, especially in
cases when the repeats are homozygous, and the resulting haplotypes are
identical.) Conversely, if the size of a repeat is significantly overestimated
or underestimated, no assignment of reads will result in an even pileup making
the genotyping error easy to notice.

## Limitations

REViewer is a tool for assessing consistency of sequencing data with repeat
genotypes produced by ExpansionHunter. It provides a mechanism for reviewing the
evidence supporting a genotype call in clinical settings and to identify
problematic corner cases to drive future development. The read pileup plots
generated by REViewer may contain inaccuracies: The repeats may not be phased
correctly (e.g., when repeats are located far apart from each other) and read
pairs consistent with both haplotypes will often be assigned to the incorrect
haplotype. Also, the current version of REViewer visualizes repeats whose span
does not exceed the fragment length and longer repeats are capped at the fragment
length.
26 changes: 26 additions & 0 deletions docs/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Quality metrics

REViewer reports various summary measurements, called **quality metrics**, that
describe key properties of read pileups. Quality metrics make it possible to
automate assessment of large collections of STR genotype calls either (a) by
selecting a series of thresholds to stratify the value of each metric into
"good", "suspicious", and "bad" categories, or (b) by using more flexible
statistical / machine learning approaches.

This document describes the quality metrics reported by REViewer. All quality
metrics are stored in a tab-separated (TSV) file `<output prefix>.metrics.tsv`.
If you have suggestions for additional metrics, please consider [creating an
issue](https://github.com/Illumina/REViewer/issues).

## Allele depth

In general, sequencing depth of a whole-genome sequencing sample is the average
number of reads that overlap any given genomic position in that sample. (For
example, a depth of 30x means that a base would be overlapped by 30 reads on
average.) The **allele depth** metric is an extension of this concept to STRs:
It reports the sequencing depth of each STR allele. The diagram below shows an
example of a well-genotyped repeat (left) where both STR alleles have the
expected sequencing depth and an example (right) where the size of the long
allele may be overestimated.

![Example of allele depth metric](images/allele-depth-example.png)

0 comments on commit 7cd3be5

Please sign in to comment.