Skip to content

Outputs

Vincent Liu edited this page Apr 22, 2021 · 2 revisions

Depending on how exactly you configure your CLI execution, one should expect to see these files in the output/final/ folder:

*.A.txt.gz
*.C.txt.gz
*.G.txt.gz
*.T.txt.gz
*.coverage.txt.gz
*.depthTable.txt
**_refAllele.txt
*.rds
*.signac.rds
*.variant_stats.tsv.gz
*.cell_heteroplasmic_df.tsv.gz
*.vmr_strand_plot.png

In order, the *{A,C,G,T}.txt.gz files will be formatted as sparse matrices, indicating the position, cell, and then forward / reverse strand count abundances of that letter for that cell / position. These files enumerate all of the sequenced alleles for all cells in the mitochondrial DNA and are the minimal units to be utilized from mgatk. After mitochondrial genotypes for each cell is determined, mgatk calls variants and computes some useful statistics for each variant, which are organized in *.variant_stats.tsv.gz. For variants confidently detected in at least three cells, heteroplasmic ratio is computed for all cells passing minimum mean per mitochondrial base coverage threshold (default 10) and organized in *.cell_heteroplasmic_df.tsv.gz, where rows are cells, columns are variants, and entries are heteroplasmic ratios. Also the strand correlation of these variants are plotted against their variance mean ratio in *.vmr_strand_plot.png, with recommended thresholds presented as dashed lines. These thresholds work well for most datasets and we recommend considering only variants that pass these thresholds for downstream analysis.

For convenience, the tool also emits a mean per cell depth in the *.depthTable.txt file. The is computed as the (total bases accounted for) / (length of mtDNA contig). Additionally, the *.coverage.txt.gz provides a sparse matrix representation of the per-cell, per-position coverage.

To orient these abundances in the context of potential mutations, the **_refAllele.txt file shows the reference alleles for the contig used in alignment/processing. This file will be independent of your source data and purely a function of the chosen reference.

Finally, two .rds files are automatically emitted that synthesize these files. The *.signac.rds file contains an S3 object that can be rapidly integrated in the Signac R package (see vignettes here: https://satijalab.org/signac/). The other *.rds file is a RangedSummarizedExperiment that similarly summarizes all data in a slightly different S4 file object. Either of these can be rapidly integrated into existing scATAC-seq workflows, depending on your analysis method of choice.



Wiki

Logo

Clone this wiki locally