Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine takes long time to run for "small" datasets #222

Open
gaow opened this issue Apr 8, 2024 · 3 comments
Open

refine takes long time to run for "small" datasets #222

gaow opened this issue Apr 8, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@gaow
Copy link
Member

gaow commented Apr 8, 2024

I have been using susie() with refine=T for various analysis. I noticed for smaller sample size it can take very long time to run. For example even with the simulated data shown in ?susieR::susie,

     library(susieR)
     set.seed(1)
     n = 1000
     p = 1000
     beta = rep(0,p)
     beta[1:4] = 1
     X = matrix(rnorm(n*p),nrow = n,ncol = p)
     X = scale(X,center = TRUE,scale = TRUE)
     y = drop(X %*% beta + rnorm(n))
     st = proc.time()
     res1 = susie(X,y,L = 10, refine=TRUE)
     proc.time() - st

It takes more than two minutes,

    user   system  elapsed 
2208.592 2871.796  130.205 

but without refine it's two seconds. @zouyuxin perhaps we should evaluate and improve the behavior of refine -- have you noticed it when you develop that feature?

@pcarbo
Copy link
Member

pcarbo commented Apr 8, 2024

@gaow With refine = TRUE, susie is being called an additional 16 times, so this much longer running time isn't surprising. (However, it would be helpful if the refinement step provided more updates on its progress.)

One workaround would be to set max_iter to a smaller value.

@gaow
Copy link
Member Author

gaow commented Apr 8, 2024

Thanks @pcarbo

One workaround would be to set max_iter to a smaller value

You mean in the "refine" codes? I think most of the time SuSiE converges in < 20 iterations anyways? It's the 16 times it is being called that seems a bit too much. In many other examples especially with larger sample size, it is much less than 16 times. I wonder if there is a way to fundamentally improve it ...

@pcarbo pcarbo added the enhancement New feature or request label Apr 8, 2024
@pcarbo
Copy link
Member

pcarbo commented Apr 8, 2024

Yes, there is quite possibly room for improvement in the refinement step, but I don't have any clever ideas at the moment. Suggestions are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants