-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify particular causal sites #121
Comments
Note that we could imagine doing something like this too for the population-specific case: def sim_trait(ts, model, *, num_causal=None, causal_sites=None, population=None, alpha=None, random_seed=None):
if num_causal is not None and causal_sites is not None:
raise ValueError("Cannot specify both num_causal and causal_sites")
if causal_sites is not None and population is not None:
raise ValueError("Cannot specify both population and causal_sites")
# More input validation
if population is not None:
# something like ts.nodes_population[ts.mutations_node] == population
# then chose num_causal from the matching sites (and the correct causal state, too)
else:
if num_causal is not None:
causal_sites = rng.choice(ts.num_sites, size=num_causal, replace=False)
causal_sites.sort()
assert causal_sites is not None
# Run the simulation based on causal_sites |
I guess we need to define first what population-specific means. Here, population-specific means that only mutations that arose in a specific population contribute to the phenotype of interest. You could also say that population-specific should mean that the effect size is population-specific. So depending on the population you are in, a mutation will have a different effect on your phenotype. |
Huh, yes. We can start with a documentation example showing how to do the "arose in a given population" interpretation, and think later about the other one. It's unclear to me what the model is then, though - isn't your interpretation that the environmental noise varies by population? |
Made a new pull request in #124. I'm also not entirely sure what population-specific means, so shall we just focus on specifying causal sites at first? I thought this point was very good after reading the draft of the paper. |
I think we agreed that for the examples I will add to the documentation we would limit ourselves to filtering variants that arose during either within a specific population or a specific time band, and then pass a subset of those IDs to |
I think we do need to provide some mechanism for specifying particular causal sites, or we lose a lot of flexibility and much of the richness that simulating based on ARGs provides. For example, we may want to simulate multiple causal sites that arose on a single ancestral haplotype, or restrict to mutations that occurred within a given population.
One approach might be:
Here, causal sites would need to be a sorted list of site IDs, which I guess is OK?
It would probably be good to cook up a few examples demonstrating this, so that we can prove to ourselves that it does provide the flexibility we want.
Earlier discussions: #53
The text was updated successfully, but these errors were encountered: