-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
5 changed files
with
99 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Overview of the method | ||
|
||
REViewer is designed to display alignments of reads generated by ExpansionHunter | ||
(Figure 3, boxes 1-3). These alignments are obtained by realigning reads | ||
originating in the target region to the corresponding sequence graph encoding one | ||
or more repeats located there5. REViewer then constructs putative haplotype | ||
sequences using genotypes produced by ExpansionHunter and then selects a pair of | ||
haplotypes that have the highest consistency with the read alignments (Figure 3, | ||
boxes 4-6). (This step is skipped for repeats on haploid chromosomes.) Next, | ||
REViewer determines the set of possible alignment positions for each read pair | ||
on each haplotype. For example, a read pair originating within a flanking | ||
sequence shared by both haplotypes has exactly one alignment position on each | ||
haplotype (Figure 3, Box 7a) while a read pair whose both mates are comprised | ||
of the repeat sequence has multiple possible origins on haplotypes with | ||
sufficiently long repeats (Figure 3, Box 7b). To generate a read pileup, | ||
REViewer selects one alignment position at random for each read pair. This step | ||
is repeated a specified number of times (10,000 by default) to generate multiple | ||
pileups. The pileup with the most even coverage of each allele is selected for | ||
visualization (Figure 3, Box 8). | ||
|
||
![Workflow outline](images/workflow.png) | ||
|
||
This algorithm is based on the idea that if a given locus is sequenced well and | ||
each constituent repeat is genotyped correctly, then it is possible to | ||
distribute the reads to achieve an even coverage of each haplotype. (Although | ||
many reads may not be assigned to the correct haplotype of origin, especially in | ||
cases when the repeats are homozygous, and the resulting haplotypes are | ||
identical.) Conversely, if the size of a repeat is significantly overestimated | ||
or underestimated, no assignment of reads will result in an even pileup making | ||
the genotyping error easy to notice. | ||
|
||
## Limitations | ||
|
||
REViewer is a tool for assessing consistency of sequencing data with repeat | ||
genotypes produced by ExpansionHunter. It provides a mechanism for reviewing the | ||
evidence supporting a genotype call in clinical settings and to identify | ||
problematic corner cases to drive future development. The read pileup plots | ||
generated by REViewer may contain inaccuracies: The repeats may not be phased | ||
correctly (e.g., when repeats are located far apart from each other) and read | ||
pairs consistent with both haplotypes will often be assigned to the incorrect | ||
haplotype. Also, the current version of REViewer visualizes repeats whose span | ||
does not exceed the fragment length and longer repeats are capped at the fragment | ||
length. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Quality metrics | ||
|
||
REViewer reports various summary measurements, called **quality metrics**, that | ||
describe key properties of read pileups. Quality metrics make it possible to | ||
automate assessment of large collections of STR genotype calls either (a) by | ||
selecting a series of thresholds to stratify the value of each metric into | ||
"good", "suspicious", and "bad" categories, or (b) by using more flexible | ||
statistical / machine learning approaches. | ||
|
||
This document describes the quality metrics reported by REViewer. All quality | ||
metrics are stored in a tab-separated (TSV) file `<output prefix>.metrics.tsv`. | ||
If you have suggestions for additional metrics, please consider [creating an | ||
issue](https://github.com/Illumina/REViewer/issues). | ||
|
||
## Allele depth | ||
|
||
In general, sequencing depth of a whole-genome sequencing sample is the average | ||
number of reads that overlap any given genomic position in that sample. (For | ||
example, a depth of 30x means that a base would be overlapped by 30 reads on | ||
average.) The **allele depth** metric is an extension of this concept to STRs: | ||
It reports the sequencing depth of each STR allele. The diagram below shows an | ||
example of a well-genotyped repeat (left) where both STR alleles have the | ||
expected sequencing depth and an example (right) where the size of the long | ||
allele may be overestimated. | ||
|
||
![Example of allele depth metric](images/allele-depth-example.png) |