Skip to content

Releases: greenelab/BioBombe

Accepted Manuscript - Genome Biology

08 Apr 16:57
c297790
Compare
Choose a tag to compare

The repository stores the full analysis pipeline and results for the bioRxiv preprint at https://doi.org/10.1101/573782

Abstract

Background

Unsupervised compression algorithms applied to gene expression data extract latent or hidden signals representing technical and biological sources of variation. However, these algorithms require a user to select a biologically-appropriate latent space dimensionality. In practice, most researchers fit a single algorithm and latent dimensionality. We sought to determine the extent by which selecting only one fit limits the biological features captured in the latent representations and, consequently, limits what can be discovered with subsequent analyses.

Results

We compress gene expression data from three large datasets consisting of adult normal tissue, adult cancer tissue, and pediatric cancer tissue. We train many different models across a large range of latent space dimensionalities and observe various performance differences. We identify more curated pathway gene sets significantly associated with individual dimensions in denoising autoencoder and variational autoencoder models trained using an intermediate number of latent dimensionalities. Combining compressed features across algorithms and dimensionalities captures the most pathway-associated representations. When trained with different latent dimensionalities, models learn strongly associated and generalizable biological representations including sex, neuroblastoma MYCN amplification, and cell-types. Stronger signals, such as tumor type, are best captured in models trained at lower dimensionalities, while more subtle signals such as pathway activity are best identified in models trained with more latent dimensionalities.

Conclusions

There is no single best latent dimensionality or compression algorithm for analyzing gene expression data. Instead, using features derived from different compression models across multiple latent space dimensionalities enhances biological representations.

Updating Signature Analysis

25 Sep 11:58
843923e
Compare
Choose a tag to compare

Updating analyses in response to reviewer comments.

See:
#184
#185
#186

Also adding website: https://greenelab.github.io/BioBombe/

BioBombe Analysis Version 1.1

08 Mar 15:21
07e36f8
Compare
Choose a tag to compare

We add the gene expression signature analysis module and make various documentation changes

BioBombe Analysis

26 Jan 17:58
c4b55cf
Compare
Choose a tag to compare

The release archives the scripts and modules used to analyze a sequential compression approach to interpret three gene expression datasets.