TPMCalculator quantifies mRNA abundance directly from the align-ments by parsing BAM files. The input parameters are the same GTF file used to generate the alignments, and one or multiple input BAM file(s) containing either single-end or paired-end sequencing reads. The TPMCalculator output is comprised of four files per sample report-ing the TPM values and raw read counts for genes, transcripts, exon and introns respectively.
The model to describe the genomic features used for a gene is created from the GTF provided by the user. TPMCalculator performs two transformations which are executed on the genomic coordinates gener-ating regions for the genes that include the exons and “pure” intron regions as shown in Figure S1. The first transformation creates over-lapped exons for all alternative spliced forms of the gene. A single gene model is generated with unique exons and introns which includes the sequence of all exonic regions. The second transformation process creates a list of pure intron regions that replace those generated by the first transformation. We should indicate that only the intron regions are modified to generate regions not overlapped by exons of other genes. Reporting TPM values for these unique introns allows further identifi-cation of alternative splicing events like intron retention. Additionally, a set of non-overlapped gene features (exons and introns) are generated and used for TPM calculation.
To validate our software, we calculate the Pearson correlation coef-ficient between TPM, FPKM and DESeq2 results for normalized expression values using RNA-Seq data of 1155 samples from the TCGA-BRCA project.
Additionally, the correlation coefficient was also calculated for the raw reads counts reported by TPMCalculator and HTSeq.
TPMCalculator reduces the compute time and the resource require-ments of RNA-Seq pipelines by eliminating multiple steps. For example, TPMCalculator processes BAM files of size 7.0 GB in ~20 minutes requiring only 4GB of RAM.
For more detailed description and instalation guide lines see https://github.com/ncbi/TPMCalculator/wiki/TPMCalculator
Roberto Vera Alvarez Email: [email protected]
Lorinc Pongor Email: [email protected]
Leonardo Mariño-Ramírez Email: [email protected]
David Landsman Email: [email protected]
This software is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the authors' official duties as United States Government employees and thus cannot be copyrighted. This software is freely available to the public for use. The National Library of Medicine and the U.S. Government have not placed any restriction on its use or reproduction.
Although all reasonable efforts have been taken to ensure the accuracy and reliability of the software and data, the NLM and the U.S. Government do not and cannot warrant the performance or results that may be obtained by using this software or data. The NLM and the U.S. Government disclaim all warranties, express or implied, including warranties of performance, merchantability or fitness for any particular purpose.
Please cite NCBI in any work or product based on this material.