Skip to content

Long-reads-based Alternative Termination Estimation and Recognition (LATER)

License

Notifications You must be signed in to change notification settings

hilgers-lab/LATER

Repository files navigation

GitHub release (latest SemVer) Maintained? Install Downloads GitHub DOI

LATER

Long-read Analysis of transcription Termination Estimation and Recognition


Installation

install.packages("devtools")
devtools::install_github("hilgers-lab/LATER", build = TRUE, build_vignettes = TRUE)

The vignette contains some examples and interpretation of the results of the analysis

library(LATER)
vignette("LATER")

Usage

LATER estimates transcriptional biases in APA using long read sequencing data

Input data:

  • Genome Alignment bam files minimap2 using parameters minimap2 -ax splice -u f annotation/genome.fa long_read.fastq.gz | samtools sort -@ 4 -o output.bam - samtools index output.bam
  • Reference annotation in gtf format. Example file here

Database creation

First, a database of 5'-3' isoforms is created based on the reference annotation provided. Combinations are computed based on isoform sets TSS and PA sites are merged in a window. This outputs a dataframe with the classification of genes by their TSS and PA site status

annot_path <- system.file("exdata/dm6.annot.gtf.gz", package="LATER")
refAnnotation <- rtracklayer::import.gff(annot_path)
linksDatabase <- prepareLinkDatabase(refAnnotation, tss.window=50, tes.window=150)

Counting links

To account for accurate quantification we develop a counter for long read sequencing data. Aligned reads to the genome are trimmed to their most 5' and 3' end keeping the read identity only reads mapping to both TSS and PA site in the reference, are considered for the analisys. Reads are then summarized in counts per million for further processing.

bamPath <- system.file("exdata/testBam.bam", package = 'LATER')
countData <- countLinks(bamPath, linksDatabase)

Estimating promoter dominance

Promoter dominance estimates are calculated as perfomed in (Alfonso-Gonzalez, et al., 2022). This function outputs per promoter biases in expression of a given 3'end of the gene.

promoterContributionEstimates <- calculatePromoterDominance(countData, linksDatabase$pairsDatabase)

Estimating transcriptional biases

Transcriptional biases are calculated by estimating using the joint frequencies of TSS-PA site combinations per gene. Coupling events per gene are estimated using multinomial testing using chi-square. Statistical testing is also available with fisher.test() using method="fisher".

biasGenes <- estimateTranscriptionalBias(countData, linksDatabase$pairsDatabase, method="fisher")

Release

Initial Release 0.1.0

Release date: 20th Dec 2022 This release corresponds to the LATER version used by Alfonso-Gonzalez et al. manuscript

Contact

Developer Carlos Alfonso-Gonzalez. For questions or feedback you can contact:

[email protected]

About

Long-reads-based Alternative Termination Estimation and Recognition (LATER)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages