Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in multithreading setup #50

Open
aksarkar opened this issue Mar 26, 2024 · 8 comments
Open

Segmentation fault in multithreading setup #50

aksarkar opened this issue Mar 26, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@aksarkar
Copy link
Member

The following example leads to a segfault

library(fastTopics)
data(pbmc_facs)
for (nc in c(1, 2)) {
  set.seed(0)
  temp <- fit_poisson_nmf(pbmc_facs$counts, k=6, init='random', numiter=1, method='em', verbose='detailed', control=list(nc=nc))
}

when run like this

$ OPENBLAS_NUM_THREADS=1 Rscript temp.R 
Fitting rank-6 Poisson NMF to 3774 x 16791 sparse matrix.
Running 1 EM updates, without extrapolation (fastTopics 0.6-117).
iter loglik(PoisNMF) loglik(multinom) res(KKT)  |F-F'|  |L-L'| nz(F) nz(L) beta
   1 -9.57230314e+06 -9.554666952e+06 1.68e+03 1.3e+02 9.5e-01 0.987 1.000 0.00
Fitting rank-6 Poisson NMF to 3774 x 16791 sparse matrix.
Running 1 EM updates, without extrapolation (fastTopics 0.6-117).
iter loglik(PoisNMF) loglik(multinom) res(KKT)  |F-F'|  |L-L'| nz(F) nz(L) beta

 *** caught segfault ***
address 0x600000005f7, cause 'memory not mapped'

Traceback:
 1: pnmfem_update_factors_sparse_parallel_rcpp(X, L, F, i - 1, numiter)
 2: pnmfem_update_loadings(X, F, L, i, numiter, nc)
 3: update_loadings_poisson_nmf(X, fit$F, fit$L, update.loadings,     method, control)
 4: update_poisson_nmf(X, fit, update.factors, update.loadings, method,     control)
 5: fit_poisson_nmf_main_loop(X, fit, numiter, update.factors, update.loadings,     method, control, verbose)
 6: fit_poisson_nmf(pbmc_facs$counts, k = 6, init = "random", numiter = 1,     method = "em", verbose = "detailed", control = list(nc = nc))
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault

The same occurs for any other choice of method. Interestingly, the following example does not:

library(fastTopics)
data(pbmc_facs)
for (method in c('em', 'scd', 'ccd')) {
  for (nc in c(2, 1)) {
    set.seed(0)
    temp <- fit_poisson_nmf(pbmc_facs$counts, k=6, init='random', numiter=1, method='em', verbose='detailed', control=list(nc=nc))
  }
}
R version 4.2.3 (2023-03-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Amazon Linux 2

Matrix products: default
BLAS/LAPACK: /home/ec2-user/mambaforge3/envs/fasttopics-dev/lib/libopenblasp-r0.3.26.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] fastTopics_0.6-117

loaded via a namespace (and not attached):
 [1] progress_1.2.3     tidyselect_1.2.0   ashr_2.2-63        purrr_1.0.2       
 [5] pbapply_1.7-2      splines_4.2.3      lattice_0.22-6     colorspace_2.1-0  
 [9] vctrs_0.6.5        generics_0.1.3     viridisLite_0.4.2  htmltools_0.5.7   
[13] MCMCpack_1.7-0     utf8_1.2.4         survival_3.5-8     plotly_4.10.4     
[17] rlang_1.1.3        mixsqp_0.3-54      pillar_1.9.0       glue_1.7.0        
[21] uwot_0.1.16        lifecycle_1.0.4    MatrixModels_0.5-3 munsell_0.5.0     
[25] gtable_0.3.4       htmlwidgets_1.6.4  coda_0.19-4.1      fastmap_1.1.1     
[29] SparseM_1.81       invgamma_1.1       quantreg_5.97      irlba_2.3.5.1     
[33] parallel_4.2.3     fansi_1.0.6        Rcpp_1.0.12        scales_1.3.0      
[37] RcppParallel_5.1.6 jsonlite_1.8.8     truncnorm_1.0-9    mcmc_0.9-8        
[41] hms_1.1.3          ggplot2_3.5.0      digest_0.6.35      Rtsne_0.17        
[45] dplyr_1.1.4        ggrepel_0.9.5      cowplot_1.1.3      grid_4.2.3        
[49] quadprog_1.5-8     tools_4.2.3        cli_3.6.2          magrittr_2.0.3    
[53] lazyeval_0.2.2     tibble_3.2.1       crayon_1.5.2       tidyr_1.3.1       
[57] pkgconfig_2.0.3    MASS_7.3-60.0.1    Matrix_1.6-5       prettyunits_1.2.0 
[61] SQUAREM_2021.1     data.table_1.15.2  httr_1.4.7         R6_2.5.1          
[65] compiler_4.2.3    
@pcarbo
Copy link
Member

pcarbo commented Mar 27, 2024

@aksarkar I tried your first example with the latest version of fastTopics (0.6-172), and did not get a segmentation fault (on my MacBook Pro Apple M2, R 4.3.3).

Segfaults are difficult to debug.

@pcarbo
Copy link
Member

pcarbo commented Mar 28, 2024

Okay I was able to generate a segfault with the following code:

library(Matrix)
library(fastTopics)
set.seed(1)
X <- simulate_count_data(80,100,k = 3,sparse = TRUE)$X
X <- X[,colSums(X > 0) > 0]
fit0 <- fit_poisson_nmf(X,k = 3,numiter = 10,method = "em",
                        init.method = "random")
fit_scd <- fit_poisson_nmf(X,fit0 = fit0,numiter = 10,method = "scd",
                           control = list(extrapolate = TRUE,nc = 2,
                                          nc.blas = 1),
                           verbose = "detailed")

If I set sparse = FALSE in the above code, the segfault does away.

Also, then rerunning this code in the same session with sparse = TRUE does not produce the segfault.

At this stage, I'm not sure how to debug this — it could be an issue caused by a recent update to one of the Rcpp packages, for example.

@pcarbo
Copy link
Member

pcarbo commented Mar 28, 2024

@aksarkar If you get any more potentially useful information about this bug, please post here.

@pcarbo pcarbo added the bug Something isn't working label Mar 28, 2024
@aksarkar
Copy link
Member Author

@pcarbo RcppCore/RcppParallel#110 appears to be relevant

@pcarbo
Copy link
Member

pcarbo commented Mar 29, 2024

Interesting discussion, but what I find odd is that the segfault only occurred for me when the input matrix was sparse.

@aksarkar
Copy link
Member Author

If I build fastTopics with -DRCPP_PARALLEL_USE_TBB=0 then I don't get segfaults in either my example or yours.

I also get much faster performance with multiple threads in fit_poisson_nmf. I will do some more tests before recommending switching away from TBB.

@pcarbo
Copy link
Member

pcarbo commented Apr 1, 2024

My understanding is that RCPP_PARALLEL_USE_TBB=0 turns off multithreading (based on this), so I don't see how you can get faster performance with multiple cores when RCPP_PARALLEL_USE_TBB=0.

@aksarkar
Copy link
Member Author

aksarkar commented Apr 3, 2024

If RcppParallel does not use TBB, then it uses TinyThreads instead (https://github.com/RcppCore/RcppParallel/blob/5aa08f8b546d2fa99372d02d7b6a50344fb09ff3/inst/include/RcppParallel.h#L46). TinyThreads uses OS threads and relies on OS scheduling.

Another thing I noticed is that almost all of the compute time is spent in apparently serial code running on one core. I think addressing that will address the lack of performance gain I observed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants