You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to use the FindConservedMarkers() function to identify differentially expressed genes across multiple samples, but error messages persisted.
I saw some ongoing discussions about which count data should be used for the DE testing, especially as I used the SCTransform method for normalization.
While the official tutorial suggested using the PrepSCTIntegration() function based on the SCT assay for identifying DEGs, a severe caveat is discussed in another tutorial that the SCT assay keeps only the top 2000-3000 highly variable genes across datasets.
That means, genes that tend to show similar expression levels and do not have high dispersions/variations will be ignored by the subsequent analysis. However, based on the research question that we are tackling, we think we should not exclude these non-highly-variable genes from the analysis for sure.
So a solution is to try using the raw "RNA" assay for the DE testing instead.
I set the slot option in the FindConservedMarkers() function to the "counts" layer (SeuratObjectV5).
Some tutorials suggest the raw counts should be used while some others suggest that the normalized counts can also be used. I think this is rather controversial but this is not the main point here.
After performing the SCTransform normalization, the DefaultAssay has been set to SCT rather than RNA. That means, the RNA assay only contains one layer called "counts" (see below).
# Set the assay to RNA for differential expression testing
DefaultAssay(seurat_obj) <-"RNA"# JoinLayer() to combine the expression data of different samplesseurat_obj<- JoinLayers(
seurat_obj,
assay="RNA",
layers="counts",
)
# Find conserved markers
FindConservedMarkers(
object=seurat_obj,
ident.1=30,
ident.2=1,
grouping.var="orig.ident",
assay="RNA",
slot="counts",
)
What the SeuratObject looks like:
> seurat_obj
An object of class Seurat
35056 features across 18912 samples within 3 assays
Active assay: RNA (11702 features, 0 variable features)
1 layer present: counts
2 other assays present: SCT, SCT.integrated
3 dimensional reductions calculated: pca, CCAintegration, umap.integrated
However, an error message appeared after running the FindConservedMarkers() step:
Testing group [[[sample_name_hidden_here]]]: (30) vs (1)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'rowSums': subscript out of bounds
In addition: Warning message:
Layer ‘data’ is empty
The error message suggested that the FindConservedMarkers() function was still trying to extract the data for DE testing from the "data" layer instead of the desired "counts" layer.
In other words, it is likely that the "slot" option has been ignored.
To add, in my analysis, some clusters do not have cells or only very few cells in some of the clusters. At first I thought that could also be the reason for this error message (because no count data could be drawn from the raw matrix), but the same error message persisted as I repeated this analysis with another set of data where all clusters have non-zero cell counts in each cluster.
I've also tried specifying a 'layer' option instead of 'slot' option (not sure if the additional options will be relayed to other functions or not), but it didn't work either.
So I have been checking what was actually going on for some time, but haven't found any practical solution yet.
I'll try with the SCT slot for the time being, but as my research question relies on all the genes (and not the highly variable genes) in the genome, it seems that this problem has to be solved anyway.
Please don't hesitate to let me know if there are any misunderstandings in my reasoning.
Thank you so much ^^
Another temporary solution that I could think of is to LogNormalize the raw counts (which will create the "data" layer). This indeed worked but it just sounds a bit weird to me that I used SCT normalization for all the remaining analysis (such as clustering) but used the LogNormalized data for DE testing.
However, this is just a roundabout to the problem because which layer to draw the expression data for DE testing is still ignored by the function.
I also checked whether the FindAllMarkers() functoin faces the same problem, and it seems that this issue is also present in this function as well.
Hi - thank you for reporting this issue! It seems like we do not support FindMarkers using the counts slot, which FindConservedMarkers calls internally. Until we fix this issue and improve flexibility, one workaround, although very hacky, would be to copy your object, and add a fake data slot with your raw counts, like so:
This would allow it to run on the unnormalized values, although it is definitely not a recommended practice in normal cases. Sorry you are encountering this issue and we will be flagging this for fixes soon!
To add, by default SCTransform now calculates residuals for all features instead of just the variable genes, as decided by the residual.features argument in this function docstring as long as it is set to NULL if that is helpful!
I was trying to use the
FindConservedMarkers()
function to identify differentially expressed genes across multiple samples, but error messages persisted.I saw some ongoing discussions about which count data should be used for the DE testing, especially as I used the
SCTransform
method for normalization.While the official tutorial suggested using the
PrepSCTIntegration()
function based on theSCT
assay for identifying DEGs, a severe caveat is discussed in another tutorial that the SCT assay keeps only the top 2000-3000 highly variable genes across datasets.That means, genes that tend to show similar expression levels and do not have high dispersions/variations will be ignored by the subsequent analysis. However, based on the research question that we are tackling, we think we should not exclude these non-highly-variable genes from the analysis for sure.
So a solution is to try using the raw
"RNA"
assay for the DE testing instead.I set the
slot
option in theFindConservedMarkers()
function to the"counts"
layer (SeuratObjectV5).SCT
rather thanRNA
. That means, theRNA
assay only contains one layer called"counts"
(see below).What the SeuratObject looks like:
However, an error message appeared after running the
FindConservedMarkers()
step:The error message suggested that the
FindConservedMarkers()
function was still trying to extract the data for DE testing from the"data"
layer instead of the desired"counts"
layer.In other words, it is likely that the
"slot"
option has been ignored.So I have been checking what was actually going on for some time, but haven't found any practical solution yet.
I'll try with the
SCT
slot for the time being, but as my research question relies on all the genes (and not the highly variable genes) in the genome, it seems that this problem has to be solved anyway.Please don't hesitate to let me know if there are any misunderstandings in my reasoning.
Thank you so much ^^
Best,
Jason.
SessionInfo
The text was updated successfully, but these errors were encountered: