Skip to content

Commit

Permalink
updated README & minor docs changes
Browse files Browse the repository at this point in the history
  • Loading branch information
jr-leary7 committed Nov 30, 2023
1 parent d9483be commit fe965d4
Show file tree
Hide file tree
Showing 7 changed files with 14 additions and 32 deletions.
Binary file modified .DS_Store
Binary file not shown.
2 changes: 1 addition & 1 deletion R/plotModelCoefs.R
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ plotModelCoefs <- function(test.dyn.res = NULL,
}
# convert coefficient summary to grob
coef_sumy_grob <- gridExtra::tableGrob(coef_sumy,
cols = c("Interval", "Coefficient"),
cols = c("Interval", "Slope"),
theme = gridExtra::ttheme_minimal(base_size = 11,
core = list(fg_params = list(hjust = 0, x = 0.05)),
colhead = list(fg_params = list(hjust = 0, x = 0.05))),
Expand Down
4 changes: 2 additions & 2 deletions R/testDynamic.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' @name testDynamic
#' @author Jack Leary
#' @description This function tests whether a NB \code{marge} model is better than a null (intercept-only) NB GLM using the Likelihood Ratio Test. In effect, the test tells us whether a gene's expression changes (in any way) over pseudotime.
#' @description This function tests whether a NB \code{marge} model is better than a null (intercept-only) model using the Likelihood Ratio Test. In effect, the test tells us whether a gene's expression changes (in any way) over pseudotime.
#' @import glm2
#' @import magrittr
#' @importFrom Matrix t
Expand Down Expand Up @@ -381,7 +381,7 @@ testDynamic <- function(expr.mat = NULL,
total_time <- end_time - start_time
total_time_units <- attributes(total_time)$units
total_time_numeric <- as.numeric(total_time)
time_message <- paste0("\nscLANE testing completed for ",
time_message <- paste0("scLANE testing completed for ",
length(genes),
" genes across ",
n_lineages,
Expand Down
38 changes: 10 additions & 28 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ knitr::opts_chunk$set(warning = FALSE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
fig.retina = TRUE)
fig.retina = TRUE,
fig.width = 5,
fig.height = 3)
```

# `scLANE`
Expand All @@ -43,18 +45,16 @@ remotes::install_github("jr-leary7/scLANE")

## Model structure

The `scLANE` package enables users to accurately determine differential expression of genes over pseudotime or latent time, and to characterize gene's dynamics using interpretable model coefficients. `scLANE` builds upon the [`marge` modeling framework](https://github.com/JakubStats/marge), allowing users to characterize their trajectory's effects on mRNA expression using negative binomial [GLMs](https://en.wikipedia.org/wiki/Generalized_linear_model), [GEEs](https://en.wikipedia.org/wiki/Generalized_estimating_equation), or [GLMMs](https://en.wikipedia.org/wiki/Generalized_linear_mixed_model), depending on the experimental design & biological questions of interest. This modeling framework is an extension of the well-known [Multivariate Adapative Regression Splines (MARS)](https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_spline) method, which uses truncated power basis splines to build nonlinear models using a [generalized cross validation (GCV) criterion](https://doi.org/10.48550/arXiv.1706.02495).
The `scLANE` package enables users to accurately determine differential expression of genes over pseudotime or latent time, and to characterize gene's dynamics using interpretable model coefficients. `scLANE` builds upon the `marge` modeling framework([GitHub](https://github.com/JakubStats/marge), [paper](https://doi.org/10.1080/10618600.2017.1360780)), allowing users to characterize their trajectory's effects on gene expression using negative binomial GLMs, GEEs, or GLMMs depending on the experimental design & biological questions of interest. This modeling framework is an extension of the [Multivariate Adapative Regression Splines (MARS)](https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_spline) method, which builds nonlinear models out of piecewise linear components. `scLANE` is agnostic with respect to the ordering estimation method used, and can be implemented downstream of any pseudotime or RNA velocity method.

A quickstart guide on how to use `scLANE` with simulated data continues below, and a more detailed vignette showcasing its performance on real data can be found [here](https://jr-leary7.github.io/quarto-site/tutorials/scLANE_Trajectory_DE.html).

# Usage

Our method relies on a relatively simple test in order to define whether a given gene is differentially expressed (or "dynamic") over the provided trajectory. While the exact structure of the test differs by model backend, the concept is the same: the spline-based NB GLM / GEE / GLMM is treated as the alternate model, and a null model is fit using the corresponding model backend. If the GLM backend is used, then the null model is simply an intercept-only NB GLM; the GEE backend fits an intercept-only model with the same working correlation structure as the alternate model, and if the GLMM backend is used then the null model is an intercept-only model with random intercepts for each subject. The alternate hypothesis is thus that at least one of the estimated coefficients is significantly different from zero. We predict a given gene to be dynamic if the adjusted *p*-value of the test is less than an *a priori* threshold; the default threshold is $\alpha = 0.01$, and the default adjustment method is [the Holm correction](https://en.wikipedia.org/wiki/Holm–Bonferroni_method).
Our method relies on a relatively simple test in order to define whether a given gene is differentially expressed (or "dynamic") over the provided trajectory. While the exact structure of the test differs by model mode, the concept is the same: the spline-based NB GLM / GEE / GLMM is treated as the alternate model, and a null model is fit using the corresponding model mode. If the GLM mode is used, then the null model is simply an intercept-only NB GLM; the GEE mode fits an intercept-only model with the same working correlation structure as the alternate model, and if the GLMM mode is used then the null model is an intercept-only model with random intercepts for each subject. The alternate hypothesis is that at least one of the estimated coefficients is significantly different from zero. We predict a given gene to be dynamic if the adjusted *p*-value of the test is less than the default $\alpha = 0.01$ threshold, and classify it as static otherwise.

## Libraries

First we'll need to load a couple packages.

```{r libraries, results='hide', message=FALSE}
library(dplyr)
library(scLANE)
Expand Down Expand Up @@ -119,7 +119,7 @@ scLANE_models_glm <- testDynamic(sim_data,
verbose = FALSE)
```

After the function finishes running, we use `getResultsDE()` to generate a sorted table of DE test results, with one row for each gene & lineage. The GLM mode uses a simple likelihood ratio test to compare the null & alternate models, with the test statistic assumed to be [asymptotically Chi-squared distributed](https://en.wikipedia.org/wiki/Likelihood-ratio_test).
After the function finishes running, we use `getResultsDE()` to generate a sorted table of DE test results, with one row for each gene & lineage. The GLM mode uses a simple likelihood ratio test to compare the null & alternate models, with the test statistic assumed to be asymptotically Chi-squared distributed.

```{r glm-results}
scLANE_res_glm <- getResultsDE(scLANE_models_glm)
Expand All @@ -132,7 +132,7 @@ select(scLANE_res_glm, Gene, Lineage, Test_Stat, P_Val, P_Val_Adj, Gene_Dynamic_

### GEE mode

The function call is essentially the same when using the GLM mode, with the exception of needing to provide a sorted vector of subject IDs & a desired correlation structure, the default being [the AR1 structure](https://en.wikipedia.org/wiki/Autoregressive_model). We also need to flip the `is.gee` flag in order to indicate that we'd like to fit estimating equations models (instead of mixed models). Since fitting GEEs is more computationally complex than fitting GLMs, DE testing with the GEE mode takes a bit longer. Using more cores and / or running the tests on an HPC cluster speeds things up considerably.
The function call is essentially the same when using the GLM mode, with the exception of needing to provide a sorted vector of subject IDs & a desired correlation structure. We also need to flip the `is.gee` flag in order to indicate that we'd like to fit estimating equations models (instead of mixed models). Since fitting GEEs is more computationally complex than fitting GLMs, DE testing with the GEE mode takes a bit longer. Using more cores and / or running the tests on an HPC cluster speeds things up considerably.

```{r gee-models}
scLANE_models_gee <- testDynamic(sim_data,
Expand All @@ -146,7 +146,7 @@ scLANE_models_gee <- testDynamic(sim_data,
verbose = FALSE)
```

We again generate the table of DE test results. The variance of the estimated coefficients is determined using [the sandwich estimator](https://online.stat.psu.edu/stat504/lesson/12/12.3), and a Wald test is used to compare the null & alternate models.
We again generate the table of DE test results. The variance of the estimated coefficients is determined using the sandwich estimator, and a Wald test is used to compare the null & alternate models.

```{r gee-results}
scLANE_res_gee <- getResultsDE(scLANE_models_gee)
Expand Down Expand Up @@ -176,7 +176,7 @@ scLANE_models_glmm <- testDynamic(sim_data,

**Note:** The GLMM mode is still under development, as we are working on further reducing runtime and increasing the odds of the underlying optimization process converging successfully. As such, updates will be frequent and functionality / results may shift slightly.

Like the GLM mode, the GLMM mode uses a likelihood ratio test to compare the null & alternate models. We fit the two nested models using maximum likelihood estimation instead of [REML](https://en.wikipedia.org/wiki/Restricted_maximum_likelihood) in order to perform this test; the null model in this case is a negative binomial GLMM with a random intercept for each subject.
Like the GLM mode, the GLMM mode uses a likelihood ratio test to compare the null & alternate models.

```{r glmm-results}
scLANE_res_glmm <- getResultsDE(scLANE_models_glmm)
Expand All @@ -191,7 +191,7 @@ select(scLANE_res_glmm, Gene, Lineage, Test_Stat, P_Val, P_Val_Adj, Gene_Dynamic

### Model comparison

We can use the `plotModels()` to visually compare different types of models. It takes as input the results from `testDynamic()`, as well as a few specifications for which models & lineages should be plotted. While more complex visualizations can be created from our model output, this function gives us a good first glance at which models fit the underlying trend the best. Here we show the output generated using the GLM mode, split by model type. The intercept-only model shows the null hypothesis against which the scLANE model is compared using the likelihood ratio test and the GLM displays the inadequacy of monotonic modeling architectures for nonlinear dynamics. A GAM shows essentially the same trend as the `scLANE` model, though the fitted trend from `scLANE` is more interpretable & has a narrower confidence interval.
We can use the `plotModels()` to visually compare different types of models. It takes as input the results from `testDynamic()`, as well as a few specifications for which models & lineages should be plotted. While more complex visualizations can be created from our model output, this function gives us a good first glance at which models fit the underlying trend the best. Here we show the output generated using the GLM mode, split by model type. The intercept-only model shows the null hypothesis against which the scLANE model is compared using the likelihood ratio test and the GLM displays the inadequacy of monotonic modeling architectures for nonlinear dynamics. A GAM shows essentially the same trend as the `scLANE` model, though the fitted trend from `scLANE` is more interpretable.

```{r plot-models-glm}
plotModels(scLANE_models_glm,
Expand All @@ -205,22 +205,6 @@ plotModels(scLANE_models_glm,
plot.scLANE = TRUE)
```

Model comparison using the GEE mode is similar, with the only change being that we now provide a vector of subject IDs.

```{r plot-models-gee}
plotModels(scLANE_models_gee,
gene = scLANE_res_gee$Gene[1],
is.gee = TRUE,
id.vec = sim_data$subject,
pt = order_df,
expr.mat = sim_data,
size.factor.offset = cell_offset,
plot.null = TRUE,
plot.glm = TRUE,
plot.gam = TRUE,
plot.scLANE = TRUE)
```

When plotting the models generated using the GLMM mode, we split by lineage & color the points by subject ID instead of by lineage. The gene in question highlights the utility of the scLANE model, since the gene dynamics differ significantly by subject.

```{r plot-models-glmm, fig.width=9, fig.height=9}
Expand All @@ -231,8 +215,6 @@ plotModels(scLANE_models_glmm,
size.factor.offset = cell_offset,
id.vec = sim_data$subject,
is.glmm = TRUE,
plot.null = FALSE,
plot.glm = TRUE,
plot.gam = TRUE,
plot.scLANE = TRUE)
```
Expand Down
Binary file modified inst/.DS_Store
Binary file not shown.
Binary file modified inst/rmarkdown/.DS_Store
Binary file not shown.
2 changes: 1 addition & 1 deletion man/testDynamic.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit fe965d4

Please sign in to comment.