Common patterns in human population simulation #705
hammer
started this conversation in
Discourse import
Replies: 2 comments
-
While browsing GitHub aimlessly tonight I happened to notice that Stephanie Gogarten has been working recently on simulate_phenotypes which is in R but may be useful for us. The vignette has an example use case. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
(Posted by @eric-czech)
This is a summary of patterns I've seen in papers that simulate human genotypes primarily for the purpose of validating association models and kinship estimation methods.
The 3 most common approaches are:
To summarize, the purpose of the synthetic simulations is only to:
There is an exception or two, but essentially all of these synthetic methods simulate genotypes directly rather than appealing to sequences with mutations. I'd love to hear how this contrasts with the work you often do @jeromekelleher, if you're up for it. I'm guessing that genotype simulation became more common in this domain because it's simpler and easier to scale. Couldn't say for sure though.
Also, I wanted to emphasize how often realistic LD is discounted in simulations for human GWAS pipelines in favor of focusing on the more difficult problems with ancestry and relatedness inference. Hopefully this will help us in defining what useful test datasets should look like.
References
I'm basing this on commonalities in these papers:
Beta Was this translation helpful? Give feedback.
All reactions