-
Notifications
You must be signed in to change notification settings - Fork 11
Example 3
This example illustrates the impact of two definitions of se
on the RSS results.
Here we consider two definitions below. The "simple" version is the standard error of the single-SNP effect estimate, which is often directly provided in the GWAS summary statistics database. The "rigorous" version is used in theoretical derivation (see Section 2.4 of RSS paper), and it requires some basic calculations based on the available summary statistics.
In practice, we find these two definitions differ negligibly, mainly because
- currently most GWAS have large sample size (
Nsnp
) and small effect sizes (betahat
); see Table 1 of RSS paper; - the published summary data are often limited to two digits to further limit the possibility of identifiability (e.g. GIANT).
Hence, we speculate that using these two definitions of se
exchangeably would not produce severely different results. In this example, we verify our guess by simulations.
We use the same dataset in Example 1 for illustration. Please contact us if you have trouble downloading the dataset example1.mat
.
To reproduce results of Example 3, please use example3.m
.
Before running example3.m
, please make sure the MCMC subroutines of RSS are installed. See instructions here.
Step 1. Define two types of se
.
We let se_1
and se_2
denote the "simple" and "rigorous" version respectively.
se_1 = se; % the simple version
se_2 = sqrt((betahat.^2) ./ Nsnp + se.^2); % the rigorous version
Before running MCMC, we first look at the difference between these two versions of se
. Below is the five-number summary of the absolute difference between se_1
and se_2
.
>> abs_diff = abs(se_1 - se_2);
>> disp(prctile(log10(abs_diff), 0:25:100)); % require stat toolbox
-12.0442 -4.6448 -3.9987 -3.4803 -1.3246
To make this example as "hard" as possible for rss
, we do not round se_1
and se_2
to 2 significant digits.
Step 2. Fit RSS-BVSR using these two versions of se
.
[betasam_1, gammasam_1, hsam_1, logpisam_1, Naccept_1] = rss_bvsr(betahat, se_1, R, Nsnp, Ndraw, Nburn, Nthin);
[betasam_2, gammasam_2, hsam_2, logpisam_2, Naccept_2] = rss_bvsr(betahat, se_2, R, Nsnp, Ndraw, Nburn, Nthin);
Step 3. Compare the posterior output.
We can look at the posterior means of beta
, and the posterior distributions of h
, log(pi)
and PVE based on se_1
(blue) and se_2
(orange).
The PVE estimate (with 95% credible interval) is 0.1932, [0.1166, 0.2869] when using se_1
, and it is 0.1896, [0.1162, 0.2765] when using se_2
.
The simulations in Section 2.3 of the RSS paper (results shown in Supplement) are essentially "replications" of the example above. To facilitate reproducible research, we make the simulated datasets publicly available (Scenario 2.1 datasets in rss_example1_simulations.tar.gz
).
After applying RSS methods to these simulated data, we obtain the following results. In the figures below, sigma_hat
corresponds to se_1
, and s_hat
corresponds to se_2
.