Skip to content

Outputs

Caleb Lareau edited this page Aug 10, 2020 · 2 revisions

Depending on how exactly you configure your CLI execution, one should expect to see these files in the output/final/ folder:

*.A.txt.gz
*.C.txt.gz
*.G.txt.gz
*.T.txt.gz
*.coverage.txt.gz
*.depthTable.txt
**_refAllele.txt
*.rds
*.signac.rds

In order, the *{A,C,G,T}.txt.gz files will be formatted as sparse matrices, indicating the position, cell, and then forward / reverse strand count abundances of that letter for that cell / position. These files enumerate all of the sequenced alleles for all cells in the mitochondrial DNA and are the minimal units to be utilized from mgatk.

For convenience, the tool also emits a mean per cell depth in the *.depthTable.txt file. The is computed as the (total bases accounted for) / (length of mtDNA contig). Additionally, the *.coverage.txt.gz provides a sparse matrix representation of the per-cell, per-position coverage.

To orient these abundances in the context of potential mutations, the **_refAllele.txt file shows the reference alleles for the contig used in alignment/processing. This file will be independent of your source data and purely a function of the chosen reference.

Finally, two .rds files are automatically emitted that synthesize these files. The *.signac.rds file contains an S3 object that can be rapidly integrated in the Signac R package (see vignettes here: https://satijalab.org/signac/). The other *.rds file is a RangedSummarizedExperiment that similarly summarizes all data in a slightly different S4 file object. Either of these can be rapidly integrated into existing scATAC-seq workflows, depending on your analysis method of choice.



Wiki

Logo

Clone this wiki locally