Importing DGE data into EnrichmentBrowser #23

grabearummc · 2020-08-21T15:02:53Z

Hi guys. I'm starting to work with EnrichmentBrowser, and I'm running into some issues. I'm looking at porting in DGE data from different sources (DESeq2, limma, and edgeR). My biggest problem is getting these objects into the proper format (SummarizedExperiment).

After exploring this repo some more, I noticed your changelog has a bullet for new import functions for deseq/limma/edgeR. This would be super helpful, so I don't have to write my own functions. So, how stable are they right now? Is the master branch "ok" to use in its current form?

The text was updated successfully, but these errors were encountered:

lgeistlinger · 2020-08-21T15:08:16Z

Thanks for contacting us. I'd say you are good to go with the master branch, and you'll find documentation and working examples when using ?import after installation. See also the documentation of the import function here: https://bioconductor.org/packages/devel/bioc/manuals/EnrichmentBrowser/man/EnrichmentBrowser.pdf

I conducted a number of tests that were looking good, but you would be def among the first users, so I'd be interested in how this works for you and whether we need to extend the functionality further.

grabearummc · 2020-08-21T15:13:20Z

Alright, thank you that's great! I'll use this issue as a forum to discuss.

grabearummc · 2020-08-21T15:36:32Z

Consider adding in functionality for EnrichmentBrowser::idMap so that it automatically validates/converts ENSEMBL ids from id.version to id (e.g. ENSG00000002919.14 to ENSG00000002919). Try to conserve id.version by adding another column to rowData. This is really more of an issue with AnnotationDBI, but it couldn't hurt.

gsub("\\..*", "", row.names(ens_table))

lgeistlinger · 2020-08-21T16:21:31Z

let's make this another issue ("ENSEMBL ID version conversion"), and let's use the current issue for focusing on functionality of EnrichmentBrowser::import -- renaming the issue for clarity

vivek-verma202 · 2020-08-28T15:51:55Z

@lgeistlinger , thank-you for the update! (esp. import).

# I used apeglm for the res:
res <- lfcShrink(dds, coef="FM_1_vs_0", type="apeglm")
se <- import(dds, res, from = "DESeq2")
Error in .importFromDESeq2(obj, res) : 
  all(rnames %in% colnames(res)) is not TRUE
colnames(res)
[1] "baseMean"       "log2FoldChange" "lfcSE"         
[4] "pvalue"         "padj"  

# repeated without "apeglm"
res <- results(dds)
se <- import(dds, res, from = "DESeq2")
Error in .importFromDESeq2(obj, res) : 
Supported experimental designs include binary group comparisons 
with an optional blocking variable for paired samples / sample batches

design(dds)
~ age + batch + condition

I have 4 questions for you:

Is LFC shrinkage needed / recommended with EnrichmentBrowser?
I have to use age (scaled, continuous) and batch (binary) as covariates to analyze for my condition of interest (binary)? How can I use this information with EnrichmentBrowser to avoid any false positives?
To circumvent covariate problem, I was thinking of using ranked gene list with pi scores res$pi <- res$log2FoldChange*(-log(res$pvalue)), can I do this for downstream topology-based methods in EnrichmentBrowser, if yes, how?
Could you suggest a better way to rank / score genes? I am not sure how should the tie broken in case of non-unique scores.

lgeistlinger · 2020-08-28T15:59:01Z

Thanks for testing + raising these issues @vivek-verma202!
I am currently sprinting towards a deadline for a grant proposal on Sunday, but I am looking into this + will get back to you on Monday.

vivek-verma202 · 2020-08-28T16:01:46Z

Thanks, good-luck with your grant application!

grabearummc · 2020-08-31T22:07:01Z

Alright here is an update for you @lgeistlinger.

I've been using import and I have a few suggestions. For importing limma data I think there needs to be a way for using data from the limma-trend approach. Check page 72 on the limma user guide.

Instead of using voom:

EnrichmentBrowser/R/import.R

Lines 120 to 132 in 4357b80

    
           #'   # (3) import from voom/limma (RNA-seq count data) 
        
           #'   # (3a) create the expression data object 
        
           #'   library(limma) 
        
           #'   keep <- filterByExpr(counts, rdesign) 
        
           #'   el <- voom(counts[keep,], rdesign) 
        
           #'    
        
           #'   # (3b) obtain differential expression results  
        
           #'   fit <- lmFit(el, rdesign) 
        
           #'   fit <- eBayes(fit, robust = TRUE)  
        
           #'   res <- topTable(fit, coef = 2, number = nrow(counts), sort.by = "none") 
        
           #' 
        
           #'   # (3c) import 
        
           #'   se <- import(el, res)

You would use the limma-trend method:

#'   # (5) import from trend/limma (RNA-seq count data)
#'   # (5a) create the expression data object
#'   library(limma)
#'   keep <- filterByExpr(counts, rdesign)
#'   
> logCPM <- edgeR::cpm(counts[keep,], log=TRUE, prior.count=3)
#'   # (5b) obtain differential expression results 
> fit <- lmFit(logCPM, design)
> fit <- eBayes(fit, trend=TRUE)
> res <- topTable(fit, coef=ncol(design))

#'
#'   # (5c) import???
#'   se <- import(el, res)

vivek-verma202 · 2020-09-01T13:58:47Z

Hi @lgeistlinger, hope your grant application went well.

I was wondering if you had a chance to look into the "limitations" of current import()?
As far as my question 1 is concerned, some reading made me realize that LFC shrinkage (apeglm) is not necessary esp. if I have already filtered out poorly expressed genes. Hence, was planning to ditch it (unless you've something to add to it).

Hope to hear from you, soon.

lgeistlinger · 2020-09-01T16:59:58Z

Hi @vivek-verma202,

Thanks for your patience.

The application has been a sprint involving daily 12 h shifts for the last 7 days
straight - but I am pretty happy with the product. Thanks for asking.

Concerning your questions:

lfcShrinkage:
According to the DESeq2 vignette only relevant for visualization and ranking of genes when having not filtered on not sufficiently expressed genes first.
In the EnrichmentBrowser, I typically use edgeR::filterByExpr (but you can easily apply customized thresholds as well) for that as the very first step of processing of the count matrix, ie before normalization, DE analysis, and GS analysis.
Thus: if you've filtered before - yes, shrinkage is not needed.
That said, I think the error that you encounter in your first post is mainly a
result of your result object (res) not having the same number of rows (genes) as your dds. I suspect rows/genes been removed by lfcShrink?
covariates:
design ~ batch + condition should work right out of the box (can you confirm?).
Supported experimental designs currently only include binary group comparisons with an optional blocking variable for paired samples / sample batches.
Adding additional covariates is currently not supported and I would take this as a feature request and will invest some time to implement that.
Note, however that only regression GS methods such as camera and roast support extended experimental designs and are able to make use of this information.
Something that you could do right away:

design ~ age + condition if you would form here a number of discrete age groups.

(inspecting resulting DE measures against design ~ condition only would btw
give you an indication whether there are indeed age-specific effects).

Downstream topology-based methods:

If you are interested in those, or in general methods that only work on the DE
results, such as eg also ora, then you can actually ignore my remarks on
experimental design above and

i) make use of the sig.stat argument that can be supplied to EnrichmentBrowser::sbea
and EnrichmentBrowser::nbea.

• beta: Log2 fold change significance level. Defaults to 1 (2-fold).

• sig.stat: decides which statistic is used for determining
significant DE genes. Options are:

        • 'p' (Default): genes with adjusted p-value below
              alpha.

        • 'fc': genes with abs(log2(fold change)) above beta

        • '&': p & fc (logical AND)

        • '|': p | fc (logical OR)

        • 'xxp': top xx % of genes sorted by adjusted p-value

        • 'xxfc' top xx % of genes sorted by absolute log2 fold
                 change.

Example: if you wanted to conduct a SPIA analysis with genes rendered differentially expressed (DE genes) that have an absolute log2 fold change > 1 and an adjusted p-value < 0.1, you would call:

spia.res <- nbea("spia", se, gs, grn, alpha = 0.1, beta = 1, sig.stat = '&')

ii) If not otherwise specified (eg via the sig.stat argument), the default point
of reference for the EnrichmentBrowser functions when it comes to gene DE measures is the ADJ.PVAL column in the rowData slot of your SummarizedExperiment (ie. your dds after using import).

se <- import(dds, res)

You can always override / abuse this column by eg setting it

rowData(se)[my.DE.genes.of.interest, "ADJ.PVAL"] <- 0.04
rowData(se)[notDE.notInteresting.genes, "ADJ.PVAL"] <- 0.06

as per default, sbea and nbea methods that work only on a list of DE genes
will take genes with ADJ.PVAL < alpha (default alpha=0.05).

Note the: xxp or xxfc options of the sig.stat argument above are
an alternative to that, allowing to select a certain percentage of genes
that you find interesting based on log2 fold change or p-value (and can
also be accordingly tweaked as above).

When it comes to ranking, the raw p-value typically has better granularity than the adjusted p-value.
Accordingly:
```
res <- DESeq2::results(dds)
sorting.df <- res[,c("pvalue","log2FoldChange")]
sorting.df[,2] <- -abs(sorting.df[,2])
ind <- do.call(order, as.data.frame(sorting.df)))
```
would give you a very fine-grained order by nominal p-value, and for genes
with equal nominal p-value it would sort by absolute fold change in decreasing order.
You could eg use that ordering after importing via:
```
se <- se[ind,] 
```
(if that still doesn't result in a satisfying ranking, you could incorporate
additional measures such as the testing statistic by modifying the above line to:
```
sorting.df <- res[,c("pvalue","log2FoldChange", "stat")]
```

Hope that helps and don't hesitate to inquire further if some things are unclear.

vivek-verma202 · 2020-09-01T19:18:59Z

Thank you so much, @lgeistlinger !
Here are the updates:

design = ~ batch + condition worked!
(Results were not very different with or without age, thanks to you, age is no more covariate of interest in my analysis.)
I could not proceed without normalization:

res <- sbea(method = "gsea", se = se, gs = go.bp.gs, browse = T)
Error in .reorderAssays(se, assay) : 
  Expression dataset (se) does not contain an assay named "norm"

Normalization worked but excluded ~91 % of genes:

se <- normalize(se, norm.method = "vst")
Excluding 8118 genes not satisfying min.cpm threshold

The results were from DESeq2 and not from raw read counts. Considering the vignette, I am not sure if normalization is needed. Also, DESeq2 output already has normalization factors:

How can I make use of DESeq2's normalization factors to normalize the count slot and create "norm" slot in an SE?
(I think, it's an important consideration for the import() function.)

vivek-verma202 · 2020-09-01T20:16:49Z

I came with a work around, not sure if it's correct, please let me know opinion:

se@assays@data@listData[["norm"]] <- se@assays@data@listData[["counts"]]/se@assays@data@listData[["normalizationFactors"]]

Looks that counts were normalized:
hist(log(se@assays@data@listData[["counts"]]))

hist(log(se@assays@data@listData[["counts"]]))

lgeistlinger · 2020-09-01T20:41:19Z

Hi @vivek-verma202 -

se <- normalize(se, norm.method = "vst")

is the right thing todo here. I'll add an argument to normalize to switch off filtering.

There are some helpful accessor functions such as assay, colData, and rowData for a SummarizedExperiment.
See also the graphical overview of a SummarizedExperiment and how to access its main parts here.

A fast approximation of what EnrichmentBrowser::normalize() with method = "vst" does:

assay(se, "norm") <- edgeR::cpm(assay(se, "counts"))

But I agree, that's something that import should support. I'll make a dedicted issue.

lgeistlinger · 2020-09-23T15:57:07Z

Closing this and will continue working on the specific issues derived from the discussion here (#28 and #29). Feel free to reopen + thanks a lot for testing and feedback.

lgeistlinger changed the title ~~How stable is the master branch?~~ Importing DGE data into EnrichmentBrowser Aug 21, 2020

lgeistlinger added the question label Aug 21, 2020

grabear mentioned this issue Aug 21, 2020

ENSEMBL ID version conversion #24

Closed

lgeistlinger mentioned this issue Sep 1, 2020

Import from limma-trend approach #26

Closed

This was referenced Sep 1, 2020

How can I make use of DESeq2's normalization factors during import? #27

Closed

import: experimental designs with more than one covariate #29

Open

lgeistlinger closed this as completed Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importing DGE data into EnrichmentBrowser #23

Importing DGE data into EnrichmentBrowser #23

grabearummc commented Aug 21, 2020

lgeistlinger commented Aug 21, 2020 •

edited

Loading

grabearummc commented Aug 21, 2020

grabearummc commented Aug 21, 2020 •

edited

Loading

lgeistlinger commented Aug 21, 2020 •

edited

Loading

vivek-verma202 commented Aug 28, 2020

lgeistlinger commented Aug 28, 2020

vivek-verma202 commented Aug 28, 2020

grabearummc commented Aug 31, 2020

vivek-verma202 commented Sep 1, 2020

lgeistlinger commented Sep 1, 2020 •

edited

Loading

vivek-verma202 commented Sep 1, 2020

vivek-verma202 commented Sep 1, 2020

lgeistlinger commented Sep 1, 2020 •

edited

Loading

lgeistlinger commented Sep 23, 2020

Importing DGE data into EnrichmentBrowser #23

Importing DGE data into EnrichmentBrowser #23

Comments

grabearummc commented Aug 21, 2020

lgeistlinger commented Aug 21, 2020 • edited Loading

grabearummc commented Aug 21, 2020

grabearummc commented Aug 21, 2020 • edited Loading

lgeistlinger commented Aug 21, 2020 • edited Loading

vivek-verma202 commented Aug 28, 2020

lgeistlinger commented Aug 28, 2020

vivek-verma202 commented Aug 28, 2020

grabearummc commented Aug 31, 2020

vivek-verma202 commented Sep 1, 2020

lgeistlinger commented Sep 1, 2020 • edited Loading

vivek-verma202 commented Sep 1, 2020

vivek-verma202 commented Sep 1, 2020

lgeistlinger commented Sep 1, 2020 • edited Loading

lgeistlinger commented Sep 23, 2020

lgeistlinger commented Aug 21, 2020 •

edited

Loading

grabearummc commented Aug 21, 2020 •

edited

Loading

lgeistlinger commented Aug 21, 2020 •

edited

Loading

lgeistlinger commented Sep 1, 2020 •

edited

Loading

lgeistlinger commented Sep 1, 2020 •

edited

Loading