Add metrics docs (#41)

* Add metrics intro
Illumina · Jan 14, 2022 · 7cd3be5 · 7cd3be5
1 parent d02c534
commit 7cd3be5
Show file tree

Hide file tree

Showing 5 changed files with 99 additions and 46 deletions.
diff --git a/README.md b/README.md
@@ -5,15 +5,22 @@ tandem repeats. REViewer requires a BAMlet with graph-realigned reads generated
 by [ExpansionHunter](https://github.com/Illumina/ExpansionHunter) and the
 corresponding variant catalog.
 
+![Introductory example](docs/images/intro-example.png)
+
 ## License
 
 REViewer is provided under the terms and conditions of the [GPLv3 license](LICENSE.txt).
 It relies on several third party packages provided under other open source licenses,
 please see [COPYRIGHT.txt](COPYRIGHT.txt) for additional details.
 
-## Building
+## Installation
+
+The simplest way of obtaining REViewer is by downloading a Linux binary
+corresponding to the latest release from the
+[Releases page](https://github.com/Illumina/REViewer/releases). The link to the
+binary is located in the *Assets* section.
 
-REViewer can be built from source using the standard CMake commands.
+REViewer can also be built from source with CMake.
 
 ```shell script
 cd REViewer
@@ -38,56 +45,33 @@ REViewer \
   --output-prefix <Prefix for the output files>
 ```
 
-Note that the BAMlet generated by ExpansionHunter (`--reads` parameter) must be sorted and indexed.
-
-## Examples of read pileups
-
-[This document](docs/examples.md) describes read pileups corresponding to correctly and incorrectly
-genotyped repeats. It is a good starting point for learning how to use the program.
+Note that the BAMlet generated by ExpansionHunter (`--reads` parameter) must be
+sorted and indexed.
 
-## Companion tool
+## Introductory guides
 
-[reviewer #2](https://github.com/broadinstitute/reviewer2) is a companion tool
-for REViewer developed by [Ben Weisburd](https://github.com/bw2) that provides a
-convenient way to assess large quantities of read pileups.
-
-## Overview of the method
+- [A blog post describing the method](https://www.illumina.com/science/genomics-research/reviewer-visualizing-alignments-short-reads-long-repeat.html)
+- [Examples of read pileups](docs/examples.md) corresponding to correctly and
+incorrectly genotyped repeats
 
-REViewer is designed to display alignments of reads generated by ExpansionHunter (Figure 3,
-boxes 1-3). These alignments are obtained by realigning reads originating in the target region
-to the corresponding sequence graph encoding one or more repeats located there5. REViewer then
-constructs putative haplotype sequences using genotypes produced by ExpansionHunter and then
-selects a pair of haplotypes that have the highest consistency with the read alignments (Figure 3,
-boxes 4-6). (This step is skipped for repeats on haploid chromosomes.) Next, REViewer determines
-the set of possible alignment positions for each read pair on each haplotype. For example, a read
-pair originating within a flanking sequence shared by both haplotypes has exactly one alignment
-position on each haplotype (Figure 3, Box 7a) while a read pair whose both mates are comprised
-of the repeat sequence has multiple possible origins on haplotypes with sufficiently long repeats
-(Figure 3, Box 7b). To generate a read pileup, REViewer selects one alignment position at random
-for each read pair. This step is repeated a specified number of times (10,000 by default) to
-generate multiple pileups. The pileup with the most even coverage of each allele is selected
-for visualization (Figure 3, Box 8).
+## Reference documentation
 
-![Workflow outline](docs/images/workflow.png)
+- [Overview of the method and its limitations](docs/method-overview.md)
+- [Description of the quality metrics reported by REViewer](docs/metrics.md)
 
-This algorithm is based on the idea that if a given locus is sequenced well and each constituent
-repeat is genotyped correctly, then it is possible to distribute the reads to achieve an even
-coverage of each haplotype. (Although many reads may not be assigned to the correct haplotype of
-origin, especially in cases when the repeats are homozygous, and the resulting haplotypes are
-identical.) Conversely, if the size of a repeat is significantly overestimated or underestimated,
-no assignment of reads will result in an even pileup making the genotyping error easy to notice.
+## Companion tools
 
-## Limitations
+- [FlipBook](https://github.com/broadinstitute/flipbook) is an image server for
+REViewer developed by [Ben Weisburd](https://github.com/bw2). It provides a
+convenient way to inspect large quantities of read pileups.
 
-REViewer is a tool for assessing consistency of sequencing data with repeat genotypes produced by 
-ExpansionHunter. It provides a mechanism for reviewing the evidence supporting a genotype call in
-clinical settings and to identify problematic corner cases to drive future development. The read
-pileup plots generated by REViewer may contain inaccuracies: The repeats may not be phased correctly
-(e.g., when repeats are located far apart from each other) and read pairs consistent with both
-haplotypes will often be assigned to the incorrect haplotype. Also, the current version of REViewer
-visualizes repeats whose span does not exceed the fragment length and longer repeats are capped at
-the fragment length.
+- [Review BAMs](https://gitlab.com/andreassh/review-bams) is script that allows
+applying REViewer to a regular BAM file (by running ExpansionHunter in the
+background). It was developed by [Andreas Halman](https://gitlab.com/andreassh).
 
-## External links
+## Citation
 
-- [A blog post describing the method](https://www.illumina.com/science/genomics-research/reviewer-visualizing-alignments-short-reads-long-repeat.html)
+- Dolzhenko E, Weisburd B, Garikano K, Rajan Babu IS, and colleagues,
+[REViewer: Haplotype-resolved visualization of read alignments in and around
+tandem repeats](https://www.biorxiv.org/content/10.1101/2021.10.20.465046v1),
+bioRxiv, 2021
diff --git a/docs/images/allele-depth-example.png b/docs/images/allele-depth-example.png
diff --git a/docs/images/intro-example.png b/docs/images/intro-example.png
diff --git a/docs/method-overview.md b/docs/method-overview.md
@@ -0,0 +1,43 @@
+# Overview of the method
+
+REViewer is designed to display alignments of reads generated by ExpansionHunter
+(Figure 3, boxes 1-3). These alignments are obtained by realigning reads
+originating in the target region to the corresponding sequence graph encoding one
+or more repeats located there5. REViewer then constructs putative haplotype
+sequences using genotypes produced by ExpansionHunter and then selects a pair of
+haplotypes that have the highest consistency with the read alignments (Figure 3,
+boxes 4-6). (This step is skipped for repeats on haploid chromosomes.) Next,
+REViewer determines the set of possible alignment positions for each read pair
+on each haplotype. For example, a read pair originating within a flanking
+sequence shared by both haplotypes has exactly one alignment position on each
+haplotype (Figure 3, Box 7a) while a read pair whose both mates are comprised
+of the repeat sequence has multiple possible origins on haplotypes with
+sufficiently long repeats (Figure 3, Box 7b). To generate a read pileup,
+REViewer selects one alignment position at random for each read pair. This step
+is repeated a specified number of times (10,000 by default) to generate multiple
+pileups. The pileup with the most even coverage of each allele is selected for
+visualization (Figure 3, Box 8).
+
+![Workflow outline](images/workflow.png)
+
+This algorithm is based on the idea that if a given locus is sequenced well and
+each constituent repeat is genotyped correctly, then it is possible to
+distribute the reads to achieve an even coverage of each haplotype. (Although
+many reads may not be assigned to the correct haplotype of origin, especially in
+cases when the repeats are homozygous, and the resulting haplotypes are
+identical.) Conversely, if the size of a repeat is significantly overestimated
+or underestimated, no assignment of reads will result in an even pileup making
+the genotyping error easy to notice.
+
+## Limitations
+
+REViewer is a tool for assessing consistency of sequencing data with repeat
+genotypes produced by ExpansionHunter. It provides a mechanism for reviewing the
+evidence supporting a genotype call in clinical settings and to identify
+problematic corner cases to drive future development. The read pileup plots
+generated by REViewer may contain inaccuracies: The repeats may not be phased
+correctly (e.g., when repeats are located far apart from each other) and read
+pairs consistent with both haplotypes will often be assigned to the incorrect
+haplotype. Also, the current version of REViewer visualizes repeats whose span
+does not exceed the fragment length and longer repeats are capped at the fragment
+length.
diff --git a/docs/metrics.md b/docs/metrics.md
@@ -0,0 +1,26 @@
+# Quality metrics
+
+REViewer reports various summary measurements, called **quality metrics**, that
+describe key properties of read pileups. Quality metrics make it possible to
+automate assessment of large collections of STR genotype calls either (a) by
+selecting a series of thresholds to stratify the value of each metric into
+"good", "suspicious", and "bad" categories, or (b) by using more flexible
+statistical / machine learning approaches.
+
+This document describes the quality metrics reported by REViewer. All quality
+metrics are stored in a tab-separated (TSV) file `<output prefix>.metrics.tsv`.
+If you have suggestions for additional metrics, please consider [creating an
+issue](https://github.com/Illumina/REViewer/issues).
+
+## Allele depth
+
+In general, sequencing depth of a whole-genome sequencing sample is the average
+number of reads that overlap any given genomic position in that sample. (For
+example, a depth of 30x means that a base would be overlapped by 30 reads on
+average.) The **allele depth** metric is an extension of this concept to STRs:
+It reports the sequencing depth of each STR allele. The diagram below shows an
+example of a well-genotyped repeat (left) where both STR alleles have the
+expected sequencing depth and an example (right) where the size of the long
+allele may be overestimated.
+
+![Example of allele depth metric](images/allele-depth-example.png)