Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differential expression #22

Open
DzenisKoca opened this issue Aug 19, 2022 · 5 comments
Open

Differential expression #22

DzenisKoca opened this issue Aug 19, 2022 · 5 comments

Comments

@DzenisKoca
Copy link

Hello,

First of all, thank you for this tool. ALRA seems really convincing and is performing really fast. I wanted to ask few questions regarding the use of ALRA. Data I am using contains 6 samples, from 6 different mice, 3 of wich are KO for certain gene. I want to compare 3 KO to 3 WT.

  1. Should ALRA be run on data that was normalized using Seurat's SCTransform() function?
  2. If I am using ALRA on few samples that I want to integrate using Seurat, is it ok to run ALRA on each sample individualy, before integration? I am using Seurat's SCTransform integration pipeline.
  3. After integration, can I use data imputed by ALRA to perform differential expression analysis?
  4. After integration, should I use some tool to remove batch effects between samples (3KO and 3WT individualy)?
@Rohit-Satyam
Copy link

Rohit-Satyam commented Jul 6, 2023

Hi @JunZhao1990 @linqiaozhi @rcannood @inoue0426 I have a similar query so I am not opening new issue. But could you please answer this? @DzenisKoca did you figure this out?

@DzenisKoca
Copy link
Author

Hello @Rohit-Satyam ,

Well I figured it out partially.

1/2. I needed to integrate data and since integration via SCTransform pipeline could not be performed using the alra assay, I didn't use the SCTransform function. Instead, I rand ALRA on each sample, before integration, then after I performed PCA, I integrated data using harmony. I tried this method while reanalyzing couple publicly available datasets, and results were satisfying. Outcome was comparable, if not improved, to what was published previously.

  1. I preferred not to run differential expression on imputed data since I found no benchmark of this.

  2. I didn't dwell into this anymore.

I hope this helps.

@Rohit-Satyam
Copy link

Rohit-Satyam commented Jul 12, 2023

Hi @DzenisKoca. Thanks for your response. Yes I went through the other ALRA issues where it was discouraged to run SCT on imputed data due the the assumptions SCT make about the data. So I am sticking to log normalization. However, I have few more questions. When I run ScoreJackStraw to determine number of PCs for downstream analysis, I get a plot like this with all P values zero:

Screenshot 2023-07-12 022135
Screenshot 2023-07-12 022228

Do you know what might be causing this?
The code I used was

## n = normal; t= drug treatment, 1 and 2 are time points T1 and T2
sample.list <- list(n1=n1,t1=t1,n2=n2,t2=t2)

## I intend to use alra imputed matrix for integration
sample.list <- lapply(X = sample.list, FUN = function(x) {
  x <- NormalizeData(x)
  x <- RunALRA(x, assay="RNA",slot="data")
  x <- FindVariableFeatures(x, nfeatures = 2000,selection.method = "vst")
})

## Malaria Cell Atlas. Don't want to perform imputation
mca.seurat <- mca.seurat %>% NormalizeData() %>% FindVariableFeatures(nfeatures = 2000,selection.method = "vst")
sample.list[5] <- mca.seurat
names(sample.list)[5] <- "mca"

saveRDS(sample.list,"sample.list.rds")
features <- SelectIntegrationFeatures(object.list = sample.list)
plasmodium.anchors <- FindIntegrationAnchors(object.list = sample.list, anchor.features = features)  
plasmodium.combined <- IntegrateData(anchorset = plasmodium.anchors)

DefaultAssay(plasmodium.combined) <- "integrated"

# Run the standard workflow for visualization and clustering
plasmodium.combined <- ScaleData(plasmodium.combined, verbose = TRUE, vars.to.regress = "percent.mt")
plasmodium.combined <- RunPCA(plasmodium.combined, verbose = TRUE)
plasmodium.combined<- JackStraw(object = plasmodium.combined, reduction = "pca", dims = 50, num.replicate = 100,  prop.freq = 0.1, verbose = TRUE)
plasmodium.combined <- ScoreJackStraw(object = plasmodium.combined, dims = 1:50, reduction = "pca")
JackStrawPlot(object = plasmodium.combined, dims = 1:50, reduction = "pca")
ElbowPlot(plasmodium.combined, ndims = 50)

@linqiaozhi In your paper you used the Jackstraw Plot to decide number of PCs:

After imputation with each method, the number of PCs to retain for each was chosen by the jackstraw method as implemented in Seurat. PCs with an assigned p-value of 1 × 10−5 or smaller were retained.

I hope you can shed some light as well.

@DzenisKoca
Copy link
Author

Hello,

as it can be seen here, authors didn't investigate whether ALRA should be run on integrated data or before the integration. I have not found the answer to this question yet. As suggested by this thread, I have run the integration pipeline with harmony (on ALRA imputed data), and results I obtained were satisfying.

I am not sure what is happening with JackStraw, I have not encountered similar issue yet.

@Rohit-Satyam
Copy link

Rohit-Satyam commented Jul 13, 2023

Hi @DzenisKoca

Yes I couldn't find any study where the recommended way of running ALRA was explored properly. But it make sense biologically to run ALRA imputation separately on data when you have normal and drug-treated single cells. And thus one can perform integration in Seurat using something like this:

features <- SelectIntegrationFeatures(object.list = sample.list, assay = c("alra","alra","alra","alra","RNA"))
plasmodium.anchors <- FindIntegrationAnchors(object.list = sample.list, anchor.features = features,assay = c("alra","alra","alra","alra","RNA"))  
plasmodium.combined <- IntegrateData(anchorset = plasmodium.anchors)

Even when assay argument isn't provided, the function will automatically take data from default assay (if u ran RunALRA, the default will be "alra"). Also, are you suggesting that IntegrateData function do not consider "alra" data slot?

Though benchmarking paper ranks harmony in the top tools for integration, for our malaria dataset, we observed it to be performing over-correction (this was also observed in another study published here). So I am a little hesitant using it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants