Parametric bootstrap simulations for global models #125
Unanswered
MengLu-flw
asked this question in
Q&A
Replies: 1 comment
-
@KLohse could you provide some insight here? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all :^)
I would like to perform parametric bootstraps via ‘gimble simulate’ for two purposes (1) to compare two nested models DIV and IM_BA by ΔlnCL (the improvement in fit), (2) to obtain 95% confidence intervals (CIs) of the parameters inferred under the best fitting global model.
I am not sure how to simulate a dataset partitioned into windows that is analogous to the empirical data - i.e., how to define -w (the number of windows per replicate) and -n (the number of blocks per window).
For example, here is the ‘gimble info’ based on my empirical data:
I thought to simulate an analogous dataset without down-sampling, I would have:
-l 64 (block length 64, same as I defined for my empirical data)
-w (not sure about how to define this, but I thought if each window has recombination, then it makes biological sense to me to have the number of windows equal to the number of chromosomes in my empirical data)
-n (not sure about how to define this, but I thought I would have the total number of blocks to be simulated as “empirical total block numbers (X)/empirical number of sample-sets (X) =90,953,757/49=1,856,199.122”. And I thought it would be like w*n=1,856,199.)
--replicates 100, --kmax 2,2,2,2.
--samples_A, --samples_B and --mu will be the same as my empirical data.
--model, --Ne_A, --Ne_B, --Ne_AB, --T, --me will use the same values as estimated under corresponding models based on empirical data.
My question at this step:
(1) Is my interpretation of the -w and -n correct?
(2) If I need to do down-sampling, how should I do it? i.e., if I need to do 10% down-sampling for simulated data compared to the empirical data amount, shall I just decrease the number of windows (-w) for my simulations?
Despite I didn’t fully understand how to define -w and -n (apologies), I had a go with ‘gimble simulate’ with the command below:
The recombination rate that I used is an averaged rate for plants (https://doi.org/10.1098/rstb.2016.0455, Table 1), so it is a very crude estimate...
The command above took 30h:27m:00.704s to complete.
To get the 95% CIs for the DIV model, I re-fit this simulated data to a DIV model with the same parameter boundaries that I set for my empirical dataset.
Then, I queried this label to get a summary output of these 100 replicates.
gimble query -z Lina_Gstore.z -l optimize/DIV_294w.windowsum/DIV_DIVmodelCompare
I found that the range of 100 bootstraps does not cover the estimated value of DIV based on my empirical data.
For example, the ranges based on simulated data are:
Ne_A_B: 890,126.572 - 895,261.233
T: 1,025,487.446 - 1,032,241.796
Ne_A: 457,520.165 - 462,095.671
Ne_B: 300,091.436 - 302,540.218
While, the parameters estimated based on empirical data (also defined in the simulation) are:
Ne_A_B: 1,009,867
T: 763,249
Ne_A: 372,683
Ne_B: 251,619
My question: Is this ‘deviation’ caused by the recombination rate? Or have I done something wrong when simulating data?
Thank you so much! Very looking forward to your insightful reply!
Happy New Year :^)
Meng
Beta Was this translation helpful? Give feedback.
All reactions