Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FindConservedMarkers function seems to ignore the "slot" argument #9365

Open
jasonleongbio opened this issue Oct 5, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@jasonleongbio
Copy link

I was trying to use the FindConservedMarkers() function to identify differentially expressed genes across multiple samples, but error messages persisted.

I saw some ongoing discussions about which count data should be used for the DE testing, especially as I used the SCTransform method for normalization.
While the official tutorial suggested using the PrepSCTIntegration() function based on the SCT assay for identifying DEGs, a severe caveat is discussed in another tutorial that the SCT assay keeps only the top 2000-3000 highly variable genes across datasets.
That means, genes that tend to show similar expression levels and do not have high dispersions/variations will be ignored by the subsequent analysis. However, based on the research question that we are tackling, we think we should not exclude these non-highly-variable genes from the analysis for sure.

So a solution is to try using the raw "RNA" assay for the DE testing instead.
I set the slot option in the FindConservedMarkers() function to the "counts" layer (SeuratObjectV5).

  • Some tutorials suggest the raw counts should be used while some others suggest that the normalized counts can also be used. I think this is rather controversial but this is not the main point here.
  • After performing the SCTransform normalization, the DefaultAssay has been set to SCT rather than RNA. That means, the RNA assay only contains one layer called "counts" (see below).
# Set the assay to RNA for differential expression testing
DefaultAssay(seurat_obj) <- "RNA"

# JoinLayer() to combine the expression data of different samples
seurat_obj <- JoinLayers(
    seurat_obj,
    assay = "RNA", 
    layers = "counts", 
)

# Find conserved markers
FindConservedMarkers(
    object = seurat_obj,
    ident.1 = 30,
    ident.2 = 1,
    grouping.var = "orig.ident",
    assay = "RNA",
    slot = "counts",
)

What the SeuratObject looks like:

> seurat_obj
An object of class Seurat 
35056 features across 18912 samples within 3 assays 
Active assay: RNA (11702 features, 0 variable features)
 1 layer present: counts
 2 other assays present: SCT, SCT.integrated
 3 dimensional reductions calculated: pca, CCAintegration, umap.integrated

However, an error message appeared after running the FindConservedMarkers() step:

Testing group [[[sample_name_hidden_here]]]: (30) vs (1)
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'rowSums': subscript out of bounds
In addition: Warning message:
Layer ‘data’ is empty 

The error message suggested that the FindConservedMarkers() function was still trying to extract the data for DE testing from the "data" layer instead of the desired "counts" layer.
In other words, it is likely that the "slot" option has been ignored.

  • To add, in my analysis, some clusters do not have cells or only very few cells in some of the clusters. At first I thought that could also be the reason for this error message (because no count data could be drawn from the raw matrix), but the same error message persisted as I repeated this analysis with another set of data where all clusters have non-zero cell counts in each cluster.
  • I've also tried specifying a 'layer' option instead of 'slot' option (not sure if the additional options will be relayed to other functions or not), but it didn't work either.

So I have been checking what was actually going on for some time, but haven't found any practical solution yet.
I'll try with the SCT slot for the time being, but as my research question relies on all the genes (and not the highly variable genes) in the genome, it seems that this problem has to be solved anyway.
Please don't hesitate to let me know if there are any misunderstandings in my reasoning.
Thank you so much ^^

Best,
Jason.

SessionInfo
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Tokyo
tzcode source: internal

attached base packages:
[1] grid      tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ComplexHeatmap_2.20.0 dittoSeq_1.16.0       jsonlite_1.8.8        mongolite_2.8.0       anndata_0.7.5.6       scCustomize_2.1.2     SeuratDisk_0.0.0.9021
 [8] Seurat_5.1.0          SeuratObject_5.0.2    sp_2.1-4              Matrix_1.7-0          plotly_4.10.4         colorspace_2.1-1      RColorBrewer_1.1-3   
[15] viridis_0.6.5         viridisLite_0.4.2     cowplot_1.1.3         hms_1.1.3             glue_1.7.0            lubridate_1.9.3       forcats_1.0.0        
[22] purrr_1.0.2           tibble_3.2.1          ggplot2_3.5.1         tidyverse_2.0.0       stringr_1.5.1         dplyr_1.1.4           tidyr_1.3.1          
[29] readr_2.1.5           renv_1.0.7           

loaded via a namespace (and not attached):
  [1] matrixStats_1.3.0           spatstat.sparse_3.1-0       httr_1.4.7                  doParallel_1.0.17           numDeriv_2016.8-1.1        
  [6] sctransform_0.4.1           utf8_1.2.4                  R6_2.5.1                    lazyeval_0.2.2              uwot_0.2.2                 
 [11] sn_2.1.1                    GetoptLong_1.0.5            withr_3.0.1                 gridExtra_2.3               progressr_0.14.0           
 [16] cli_3.6.3                   Biobase_2.64.0              textshaping_0.4.0           spatstat.explore_3.3-1      fastDummies_1.7.3          
 [21] sandwich_3.1-0              labeling_0.4.3              prismatic_1.1.2             mvtnorm_1.2-5               spatstat.data_3.1-2        
 [26] ggridges_0.5.6              pbapply_1.7-2               askpass_1.2.0               systemfonts_1.1.0           parallelly_1.38.0          
 [31] plotrix_3.8-4               rstudioapi_0.16.0           generics_0.1.3              shape_1.4.6.1               ica_1.0-3                  
 [36] spatstat.random_3.3-1       vroom_1.6.5                 ggbeeswarm_0.7.2            fansi_1.0.6                 S4Vectors_0.42.1           
 [41] abind_1.4-5                 lifecycle_1.0.4             multcomp_1.4-26             snakecase_0.11.1            mathjaxr_1.6-0             
 [46] SummarizedExperiment_1.34.0 SparseArray_1.4.8           Rtsne_0.17                  paletteer_1.6.0             promises_1.3.0             
 [51] crayon_1.5.3                miniUI_0.1.1.1              lattice_0.22-6              pillar_1.9.0                GenomicRanges_1.56.1       
 [56] rjson_0.2.21                future.apply_1.11.2         codetools_0.2-20            leiden_0.4.3.1              mutoss_0.1-13              
 [61] spatstat.univar_3.0-0       data.table_1.15.4           vctrs_0.6.5                 png_0.1-8                   spam_2.10-0                
 [66] Rdpack_2.6.1                gtable_0.3.5                assertthat_0.2.1            rematch2_2.1.2              rbibutils_2.2.16           
 [71] S4Arrays_1.4.1              mime_0.12                   survival_3.7-0              SingleCellExperiment_1.26.0 pheatmap_1.0.12            
 [76] iterators_1.0.14            fitdistrplus_1.2-1          TH.data_1.1-2               ROCR_1.0-11                 nlme_3.1-165               
 [81] bit64_4.0.5                 RcppAnnoy_0.0.22            GenomeInfoDb_1.40.1         irlba_2.3.5.1               vipor_0.4.7                
 [86] KernSmooth_2.23-24          BiocGenerics_0.50.0         ggrastr_1.0.2               mnormt_2.1.1                tidyselect_1.2.1           
 [91] bit_4.0.5                   compiler_4.4.1              hdf5r_1.3.11                TFisher_0.2.0               DelayedArray_0.30.1        
 [96] scales_1.3.0                lmtest_0.9-40               digest_0.6.36               goftest_1.2-3               spatstat.utils_3.0-5       
[101] XVector_0.44.0              htmltools_0.5.8.1           pkgconfig_2.0.3             MatrixGenerics_1.16.0       fastmap_1.2.0              
[106] rlang_1.1.4                 GlobalOptions_0.1.2         htmlwidgets_1.6.4           UCSC.utils_1.0.0            shiny_1.9.1                
[111] farver_2.1.2                zoo_1.8-12                  magrittr_2.0.3              GenomeInfoDbData_1.2.12     dotCall64_1.1-1            
[116] patchwork_1.2.0             munsell_0.5.1               Rcpp_1.0.13                 reticulate_1.38.0           stringi_1.8.4              
[121] zlibbioc_1.50.0             MASS_7.3-61                 plyr_1.8.9                  parallel_4.4.1              listenv_0.9.1              
[126] ggrepel_0.9.5               deldir_2.0-4                splines_4.4.1               tensor_1.5                  multtest_2.60.0            
[131] circlize_0.4.16             qqconf_1.3.2                igraph_2.0.3                spatstat.geom_3.3-2         RcppHNSW_0.6.0             
[136] reshape2_1.4.4              stats4_4.4.1                metap_1.11                  ggprism_1.0.5               tzdb_0.4.0                 
[141] foreach_1.5.2               httpuv_1.6.15               RANN_2.6.1                  openssl_2.2.0               polyclip_1.10-7            
[146] future_1.34.0               clue_0.3-65                 scattermore_1.2             janitor_2.2.0               xtable_1.8-4               
[151] RSpectra_0.16-2             later_1.3.2                 ragg_1.3.2                  beeswarm_0.4.0              IRanges_2.38.1             
[156] cluster_2.1.6               timechange_0.3.0            globals_0.16.3             

@jasonleongbio jasonleongbio added the bug Something isn't working label Oct 5, 2024
@jasonleongbio
Copy link
Author

Another temporary solution that I could think of is to LogNormalize the raw counts (which will create the "data" layer). This indeed worked but it just sounds a bit weird to me that I used SCT normalization for all the remaining analysis (such as clustering) but used the LogNormalized data for DE testing.
However, this is just a roundabout to the problem because which layer to draw the expression data for DE testing is still ignored by the function.
I also checked whether the FindAllMarkers() functoin faces the same problem, and it seems that this issue is also present in this function as well.

# Log-normalize the raw counts
seurat_obj[["RNA.LogNormalized"]] <- seurat_obj[["RNA"]]
DefaultAssay(seurat_obj) <- "RNA.LogNormalized"

seurat_obj <- NormalizeData(
    seurat_obj, 
    normalization.method = "LogNormalize", 
    scale.factor = 10000
)

common_markers_of_cluster <- FindConservedMarkers(
    object = seurat_obj,
    ident.1 = 30,
    ident.2 = 1,
    grouping.var = "orig.ident",
    assay = "RNA.LogNormalized",
    #slot = "counts",
    #layer = "counts"
)

@zskylarli
Copy link
Contributor

Hi - thank you for reporting this issue! It seems like we do not support FindMarkers using the counts slot, which FindConservedMarkers calls internally. Until we fix this issue and improve flexibility, one workaround, although very hacky, would be to copy your object, and add a fake data slot with your raw counts, like so:

seurat_obj[['RNA']]$data <- seurat_obj[['RNA']]$counts

This would allow it to run on the unnormalized values, although it is definitely not a recommended practice in normal cases. Sorry you are encountering this issue and we will be flagging this for fixes soon!

@zskylarli
Copy link
Contributor

To add, by default SCTransform now calculates residuals for all features instead of just the variable genes, as decided by the residual.features argument in this function docstring as long as it is set to NULL if that is helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants