Is it possible to test the hypothesis that the significant variants are in a set of SNPs? #196

garyzhubc · 2023-07-11T21:00:35Z

Is it possible to test the hypothesis that the significant variants are in a set of SNPs?

garyzhubc · 2023-07-18T19:52:23Z

Not just sets from the susie_get_cs()

stephens999 · 2023-07-18T21:18:26Z

sorry I don't think we understand the question.

garyzhubc · 2023-07-18T22:08:54Z

Can I test the hypothesis like "beta_2,beta_3\ne0, beta_1,beta_4=0"?

stephens999 · 2023-07-18T23:33:29Z

perhaps the simplest way would be to sample from the (approximate) posterior distribution on beta, and then use that sample to compute the probability of any particular combination
of 0/non-zero beta values.

garyzhubc · 2023-07-19T18:25:00Z

Something like this (test the hypothesis that significant variants are in the middle)?

posterior<-unlist(lapply(1:1000, function(i) (max(samp$b[40:60,i])>0 | min(samp$b[40:60,i])<0) & (samp$b[1,1:39] == 0 & samp$b[1,61:101] == 0)))

result in:

> mean(posterior)
[1] 0.9756098

stephens999 · 2023-07-20T02:38:07Z

I think the code doesn't look quite right, but something like that, yes

garyzhubc · 2023-07-20T02:53:59Z

I made an edit on the code. Could you tell me why it doesn't look right?

stephens999 · 2023-07-20T11:51:46Z

you have samp$b[1,1:39]
but samp$b[40:60,i]

so the indices of b don't look consistent through your code

garyzhubc · 2023-07-20T14:36:23Z

Thanks for spotting this mistake. I'm now using:

> posterior<-unlist(lapply(1:num_samples, function(i) (max(samp$b[40:60,i])>0 | min(samp$b[40:60,i])<0) & all(c(samp$b[1:39,i] == 0, samp$b[61:101,i] == 0))))
> mean(posterior)
[1] 0
> posterior<-unlist(lapply(1:num_samples, function(i) (max(samp$b[1:39,i])>0 | min(samp$b[1:39,i])<0) & all(samp$b[40:101,i] == 0)))
> mean(posterior)
[1] 0
> posterior<-unlist(lapply(1:num_samples, function(i) (max(samp$b[61:101,i])>0 | min(samp$b[61:101,i])<0) & all(samp$b[1:60,i] == 0)))
> mean(posterior)
[1] 0

which I believe is consistent. However, it looks like these all have very small probabilities. Do you recommend instead of testing all(samp$b[1:60,i] == 0) try something like all(abs(samp$b[1:60,i]) < episilon), or do you think what I'm doing is okay?

garyzhubc · 2023-10-10T21:17:47Z

Can anyone respond to this issue please? @pcarbo

pcarbo · 2023-10-11T14:10:32Z

The issue of working with very small probabilities is a common issue and there are some ways to help with this. If you can share with us a reproducible example illustrating exactly what you are trying to do, I might be able to help you.

As a general piece of advice, I recommend starting with an example that is as simple as possible, e.g., an example with exactly 4 variables X.

garyzhubc · 2023-10-11T20:36:00Z

An example of four variables b1, b2, b3, b4: suppose I want to calculate P(b1=0 and b4 =0 and (b2!=0 or b3!=0)) as the probability that the causal variant is in {b2,b3}, shall I run this program below to get the probability?

samp<-susie_get_posterior_samples(res_, num_samples)
posterior23<-unlist(lapply(1:num_samples, function(i) (max(samp$b[2:3,i])>0 | min(samp$b[2:3,i])<0) & all(c(samp$b[1,i] == 0, samp$b[4,i] == 0))))
mean(posterior23)

If I rank posterior1, posterior2, posterior3, posterior12, posterior23, posterior13, posterior123 and select the one with probability greater than 0.95, will I get the same credible interval (default coverage = 0.95):

susie_get_cs(res_)

pcarbo · 2023-10-12T02:05:02Z

@garyzhubc This is not a reproducible example because some variables in your code (e.g., res_) are not defined.

stephens999 · 2023-10-12T13:17:09Z

However, it looks like these all have very small probabilities. Do you recommend instead of testing all(samp$b[1:60,i] == 0) try something like all(abs(samp$b[1:60,i]) < episilon), or do you think what I'm doing is okay?

I think what you are doing looks OK, and you are just getting the answer that there is a very small probability of the event you are looking at. You could also try the epsilon approach you suggested.

garyzhubc · 2023-10-21T21:29:32Z

I also tried using PIP directly instead of sampling.

prod(1-res$pip[1:34])*(1-prod(1-res$pip[35:68]))*prod(1-res$pip[69:101])

still gives zero probabilities. See #203 (comment)

garyzhubc · 2023-10-21T21:40:43Z

@garyzhubc This is not a reproducible example because some variables in your code (e.g., res_) are not defined.

I could do the same on this example https://stephenslab.github.io/susieR/articles/sparse_susie_eval.html:

create_sparsity_mat = function(sparsity, n, p) {
  nonzero          <- round(n*p*(1-sparsity))
  nonzero.idx      <- sample(n*p, nonzero)
  mat              <- numeric(n*p)
  mat[nonzero.idx] <- 1
  mat              <- matrix(mat, nrow=n, ncol=p)
}
n <- 1000
p <- 1000
beta <- rep(0,p)
beta[c(1,300,400,1000)] <- 10 
X.dense  <- create_sparsity_mat(0.99,n,p)
X.sparse <- as(X.dense,"CsparseMatrix")
y <- c(X.dense %*% beta + rnorm(n))
susie.sparse <- susie(X.sparse,y)

Using Monte Carlo sample from posterior:

num_samples<-10000
samp<-susie_get_posterior_samples(susie.sparse, num_samples)
posterior<-unlist(lapply(1:num_samples, function(i) (max(samp$b[341:680,i])>0 | min(samp$b[341:680,i])<0) & all(c(samp$b[1:340,i] == 0, samp$b[681:1000,i] == 0))))
mean(posterior)

Using PIP:

prod(1-susie.sparse$pip[1:340])*(1-prod(1-susie.sparse$pip[341:680]))*prod(1-susie.sparse$pip[681:1000])

Both gives probability zero.

pcarbo · 2023-10-23T17:52:33Z

@garyzhubc I think the issue is that in your example all the inclusion probabilities are either 1 or very, very small, so it may be tricky to use a naive Monte Carlo sampling approach:

hist(log10(susie.sparse$alpha),n = 64)

One idea that comes to mind is importance sampling, but you might want to start with an example where the probabilities are less extreme.

garyzhubc changed the title ~~Is it possible to test the probability that the significant variants are in a group of SNPs?~~ Is it possible to test the probability that the significant variants are in a set of SNPs? Jul 18, 2023

garyzhubc changed the title ~~Is it possible to test the probability that the significant variants are in a set of SNPs?~~ Is it possible to test the hypothesis that the significant variants are in a set of SNPs? Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to test the hypothesis that the significant variants are in a set of SNPs? #196

Is it possible to test the hypothesis that the significant variants are in a set of SNPs? #196

garyzhubc commented Jul 11, 2023 •

edited

Loading

garyzhubc commented Jul 18, 2023

stephens999 commented Jul 18, 2023

garyzhubc commented Jul 18, 2023

stephens999 commented Jul 18, 2023

garyzhubc commented Jul 19, 2023 •

edited

Loading

stephens999 commented Jul 20, 2023

garyzhubc commented Jul 20, 2023

stephens999 commented Jul 20, 2023

garyzhubc commented Jul 20, 2023 •

edited

Loading

garyzhubc commented Oct 10, 2023

pcarbo commented Oct 11, 2023

garyzhubc commented Oct 11, 2023 •

edited

Loading

pcarbo commented Oct 12, 2023

stephens999 commented Oct 12, 2023

garyzhubc commented Oct 21, 2023 •

edited

Loading

garyzhubc commented Oct 21, 2023 •

edited

Loading

pcarbo commented Oct 23, 2023

Is it possible to test the hypothesis that the significant variants are in a set of SNPs? #196

Is it possible to test the hypothesis that the significant variants are in a set of SNPs? #196

Comments

garyzhubc commented Jul 11, 2023 • edited Loading

garyzhubc commented Jul 18, 2023

stephens999 commented Jul 18, 2023

garyzhubc commented Jul 18, 2023

stephens999 commented Jul 18, 2023

garyzhubc commented Jul 19, 2023 • edited Loading

stephens999 commented Jul 20, 2023

garyzhubc commented Jul 20, 2023

stephens999 commented Jul 20, 2023

garyzhubc commented Jul 20, 2023 • edited Loading

garyzhubc commented Oct 10, 2023

pcarbo commented Oct 11, 2023

garyzhubc commented Oct 11, 2023 • edited Loading

pcarbo commented Oct 12, 2023

stephens999 commented Oct 12, 2023

garyzhubc commented Oct 21, 2023 • edited Loading

garyzhubc commented Oct 21, 2023 • edited Loading

pcarbo commented Oct 23, 2023

garyzhubc commented Jul 11, 2023 •

edited

Loading

garyzhubc commented Jul 19, 2023 •

edited

Loading

garyzhubc commented Jul 20, 2023 •

edited

Loading

garyzhubc commented Oct 11, 2023 •

edited

Loading

garyzhubc commented Oct 21, 2023 •

edited

Loading

garyzhubc commented Oct 21, 2023 •

edited

Loading