Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MotifScan fails when there is a gene with more than one motif #343

Open
jmodlis opened this issue Dec 16, 2024 · 0 comments
Open

MotifScan fails when there is a gene with more than one motif #343

jmodlis opened this issue Dec 16, 2024 · 0 comments

Comments

@jmodlis
Copy link

jmodlis commented Dec 16, 2024

Thank you for your development of this package! I'm running into an issue when using MotifScan with EnsDb.Mmusculus.v79 and mm10. I get the the error message below.

Error message

> [1] "Matching motifs..."
> [1] "Getting putative TF target genes..."
> Error in `$<-.data.frame`(`*tmp*`, "n_targets", value = c(Arnt = 5672L,  : 
>  replacement has 828 rows, data has 879

Stepping through the source code line by line, I find that the error is
triggered in the last line in the block of code below.

  colnames(tf_match) <- motif_df$motif_name ###

    # only keep genes that are in the Seurat object and in the given EnsDb:
    gene_list <- rownames(seurat_obj)
    gene_list <- gene_list[gene_list %in% rownames(tf_match)]
    tf_match <- tf_match[gene_list,]

    # get list of target genes for each TF:
    print('Getting putative TF target genes...')
    tfs <- motif_df$motif_name ###
    tf_targets <- list()
    n_targets <- list()
    for(cur_tf in tfs){      tf_targets[[cur_tf]] <- names(tf_match[,cur_tf][tf_match[,cur_tf]])
        n_targets[[cur_tf]] <- length(tf_targets[[cur_tf]] )
    }
    n_targets <- unlist(n_targets)

    # add number of target genes to motif_df
    motif_df$n_targets <- n_targets

The error is occurring because there are some genes that have more than
one motif ID. In other words, there are duplicate column names. Changing
the hashtagged lines above to use motif_ID instead of motif_name
allows me to run the code without error.

colnames(tf_match) <- motif_df$motif_ID ###
tfs <- motif_df$motif_ID ###

However, since this is mouse, there is still a problem... gene_name and motif_name are mostly human genes. I'm working on the fix for that and will post a separate issue.

As some additional information, I ran WGCNA on the seurat object previously and generated the pfm in this way:

library(JASPAR2024)
library(TFBSTools)
jaspar <- JASPAR2024::JASPAR2024()
sq24 <- RSQLite::dbConnect(RSQLite::SQLite(), db(jaspar))
pfm_core <- getMatrixSet(sq24,
              opts = list(collection = "CORE", 
                          tax_group = 'vertebrates', 
                          all_versions = FALSE))

R session info

r$> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Rocky Linux 9.4 (Blue Onyx)

Matrix products: default
BLAS/LAPACK: /datastore/nextgenout5/share/labs/bioinformatics/projects/Serody-Meier-20240314_scRNASeqTcells/scripts/conda/envs/sc-rna-seq/lib/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.58.0                     viridis_0.6.5                      viridisLite_0.4.2                  patchwork_1.2.0                   
 [5] gridExtra_2.3                      GGally_2.2.1                       edgeR_4.0.2                        limma_3.58.1                      
 [9] DESeq2_1.42.0                      ggridges_0.5.6                     pals_1.8                           circlize_0.4.16                   
[13] ComplexHeatmap_2.18.0              SingleCellExperiment_1.24.0        xgboost_2.1.3.1                    BSgenome.Mmusculus.UCSC.mm10_1.4.3
[17] BSgenome_1.70.1                    rtracklayer_1.62.0                 BiocIO_1.12.0                      Biostrings_2.70.1                 
[21] XVector_0.42.0                     EnsDb.Mmusculus.v79_2.99.0         ensembldb_2.26.0                   AnnotationFilter_1.26.0           
[25] GenomicFeatures_1.54.1             AnnotationDbi_1.64.1               TFBSTools_1.40.0                   motifmatchr_1.24.0                
[29] JASPAR2024_0.99.6                  cowplot_1.1.3                      hdWGCNA_0.4.00                     GeneOverlap_1.38.0                
[33] UCell_2.6.2                        tidygraph_1.3.1                    ggraph_2.2.1                       igraph_2.1.2                      
[37] WGCNA_1.73                         fastcluster_1.2.6                  dynamicTreeCut_1.63-1              ggrepel_0.9.5                     
[41] ggalluvial_0.12.5                  harmony_1.2.0                      Rcpp_1.0.12                        Seurat_5.0.3                      
[45] SeuratObject_5.0.1                 sp_2.1-3                           SingleR_2.4.0                      SummarizedExperiment_1.32.0       
[49] Biobase_2.62.0                     GenomicRanges_1.54.1               GenomeInfoDb_1.38.1                IRanges_2.36.0                    
[53] S4Vectors_0.40.2                   MatrixGenerics_1.14.0              matrixStats_1.2.0                  dplyr_1.1.4                       
[57] AnnotationHub_3.10.0               BiocFileCache_2.10.1               dbplyr_2.4.0                       BiocGenerics_0.48.1               
[61] ProjecTILs_3.3.1                   data.table_1.15.2                  ggplot2_3.5.0                      rmarkdown_2.26                    
[65] kableExtra_1.4.0                   knitr_1.45                         docopt_0.7.1                       pander_0.6.5                      

loaded via a namespace (and not attached):
  [1] dichromat_2.0-0.1             R.methodsS3_1.8.2             progress_1.2.3                nnet_7.3-19                   poweRlaw_0.70.6              
  [6] goftest_1.2-3                 vctrs_0.6.5                   spatstat.random_3.2-3         shape_1.4.6.1                 digest_0.6.35                
 [11] png_0.1-8                     proxy_0.4-27                  deldir_2.0-4                  parallelly_1.37.1             MASS_7.3-60                  
 [16] reshape2_1.4.4                httpuv_1.6.15                 foreach_1.5.2                 withr_3.0.0                   xfun_0.43                    
 [21] survival_3.5-8                memoise_2.0.1                 systemfonts_1.1.0             GlobalOptions_0.1.2           zoo_1.8-12                   
 [26] gtools_3.9.5                  pbapply_1.7-2                 R.oo_1.26.0                   prettyunits_1.2.0             Formula_1.2-5                
 [31] KEGGREST_1.42.0               promises_1.2.1                httr_1.4.7                    restfulr_0.0.15               globals_0.16.3               
 [36] fitdistrplus_1.1-11           rstudioapi_0.16.0             miniUI_0.1.1.1                generics_0.1.3                base64enc_0.1-3              
 [41] curl_5.1.0                    zlibbioc_1.48.0               ScaledMatrix_1.10.0           polyclip_1.10-6               GenomeInfoDbData_1.2.11      
 [46] SparseArray_1.2.2             interactiveDisplayBase_1.40.0 xtable_1.8-4                  stringr_1.5.1                 pracma_2.4.4                 
 [51] doParallel_1.0.17             evaluate_0.23                 S4Arrays_1.2.0                preprocessCore_1.64.0         hms_1.1.3                    
 [56] irlba_2.3.5.1                 colorspace_2.1-0              filelock_1.0.3                ROCR_1.0-11                   reticulate_1.35.0            
 [61] spatstat.data_3.0-4           magrittr_2.0.3                lmtest_0.9-40                 readr_2.1.5                   later_1.3.2                  
 [66] lattice_0.22-6                mapproj_1.2.11                spatstat.geom_3.2-9           future.apply_1.11.2           scattermore_1.2              
 [71] XML_3.99-0.16                 RcppAnnoy_0.0.22              Hmisc_5.1-1                   pillar_1.9.0                  nlme_3.1-164                 
 [76] iterators_1.0.14              caTools_1.18.2                compiler_4.3.3                beachmat_2.18.0               RSpectra_0.16-1              
 [81] stringi_1.8.3                 tensor_1.5                    GenomicAlignments_1.38.0      plyr_1.8.9                    crayon_1.5.2                 
 [86] abind_1.4-5                   locfit_1.5-9.9                graphlayouts_1.2.1            bit_4.0.5                     scGate_1.6.0                 
 [91] codetools_0.2-20              BiocSingular_1.18.0           GetoptLong_1.0.5              plotly_4.10.4                 mime_0.12                    
 [96] splines_4.3.3                 fastDummies_1.7.3             sparseMatrixStats_1.14.0      blob_1.2.4                    utf8_1.2.4                   
[101] clue_0.3-65                   BiocVersion_3.18.1            seqLogo_1.68.0                fs_1.6.3                      listenv_0.9.1                
[106] checkmate_2.3.0               DelayedMatrixStats_1.24.0     tibble_3.2.1                  Matrix_1.6-5                  statmod_1.5.0                
[111] tzdb_0.4.0                    svglite_2.1.3                 tweenr_2.0.3                  pkgconfig_2.0.3               pheatmap_1.0.12              
[116] tools_4.3.3                   cachem_1.0.8                  RSQLite_2.3.4                 DBI_1.2.2                     impute_1.76.0                
[121] fastmap_1.1.1                 scales_1.3.0                  ica_1.0-3                     Rsamtools_2.18.0              ggstats_0.5.1                
[126] BiocManager_1.30.22           dotCall64_1.1-1               RANN_2.6.1                    rpart_4.1.23                  farver_2.1.1                 
[131] yaml_2.3.8                    foreign_0.8-86                cli_3.6.2                     purrr_1.0.2                   tester_0.2.0                 
[136] leiden_0.4.3.1                lifecycle_1.0.4               uwot_0.1.16                   backports_1.4.1               BiocParallel_1.36.0          
[141] annotate_1.80.0               gtable_0.3.4                  rjson_0.2.21                  progressr_0.14.0              parallel_4.3.3               
[146] jsonlite_1.8.8                RcppHNSW_0.6.0                bitops_1.0-7                  bit64_4.0.5                   Rtsne_0.17                   
[151] spatstat.utils_3.1-1          BiocNeighbors_1.20.0          CNEr_1.38.0                   R.utils_2.12.3                lazyeval_0.2.2               
[156] shiny_1.8.1                   htmltools_0.5.8               GO.db_3.18.0                  sctransform_0.4.1             rappdirs_0.3.3               
[161] glue_1.7.0                    TFMPvalue_0.0.9               spam_2.10-0                   RCurl_1.98-1.14               R6_2.5.1                     
[166] tidyr_1.3.1                   gplots_3.1.3.1                STACAS_2.2.2                  cluster_2.1.6                 DirichletMultinomial_1.44.0  
[171] ProtGenerics_1.34.0           DelayedArray_0.28.0           tidyselect_1.2.0              maps_3.4.2                    htmlTable_2.4.1              
[176] ggforce_0.4.2                 xml2_1.3.6                    future_1.33.2                 rsvd_1.0.5                    munsell_0.5.1                
[181] KernSmooth_2.23-22            htmlwidgets_1.6.4             RColorBrewer_1.1-3            rlang_1.1.3                   spatstat.sparse_3.0-3        
[186] spatstat.explore_3.2-6        fansi_1.0.6                  

Thanks!
Jen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant