-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
population-specific recombination rates? #2127
Comments
I think this general strategy would work for the DTWF. It would be much more difficult (or indeed not make sense at all as you point out) in the Hudson coalescent because we have a global Fenwick tree for recombinations. See previous related discussions: |
Hi @gtsambos,
It isn't specifically supported in SLiM, in the sense that there is no specific facility for setting one recombination rate for one subpopulation and a different rate for a different subpopulation. (I'm curious why this is desirable – what does it mean biologically? Is one population subject to some environmental chemical or stressor that increases their recombination rate? Who wants to model this and why?) But it is straightforward to do in SLiM even though there is no specific support. You would (a) set the recombination rate to the highest rate that you would want any individual to experience, and then (b) write a |
Thanks for this detailed reply @bhaller, and sorry that I'm coming back to it so late! It's great that there is more 'manual' way to do this in SLiM -- I might ping the people who asked us about this, just in case this is a viable option for them.
It's not something I'm super well versed on myself, but my understanding is that there's quite a bit of observed evidence for the evolution of recombination rates over time, both within and across species. I'm not sure how well-understood the biological causes of these changes are, except perhaps in rare cases like PRDM9, and ofc in these cases it's more that population membership is just a rough proxy for some relevant causal allele. However, allowing different simulated 'populations' to use different recombination rates may be a simple first step that helps us model this more realistically (regardless of what the underlying 'cause' is) ✨ |
so basically -- if it's observed that recombination rates vary between sets of organisms, then having some way of modelling that will make the simulated patterns of variation a bit more realistic. And that's regardless of what the downstream application of those simulations is. |
I see. Well, the same approach in SLiM would allow you to actually model the evolution of recombination rates, without the approximation of just making the rate different in one subpopulation than in another subpopulation. You could specifically check for a particular mutation in a given individual, and if the mutation is present, use a different recombination rate to generate a gamete coming from that individual. Indeed, you could model a whole set of QTLs that together determine the phenotype of the recombination rate for a given individual's gamete genesis. All of that sort of thing is actually quite trivial to do in SLiM. :-> |
I think this does the trick for a simple three population split model with no migration and different recombination rates: import msprime
import tskit
split_time = 200
demography = msprime.Demography()
demography.add_population(name="A", initial_size=10_000)
demography.add_population(name="B", initial_size=5_000)
demography.add_population(name="C", initial_size=1_000)
demography.add_population_split(time=split_time, derived=["A", "B"], ancestral="C")
L = 100_000
seed = 1236
tsA = msprime.sim_ancestry(
{"A": 3},
demography=demography,
recombination_rate=1e-8,
sequence_length=L,
random_seed=seed,
end_time=split_time,
)
tsB = msprime.sim_ancestry(
{"B": 3},
demography=demography,
recombination_rate=5e-8,
sequence_length=L,
# NOTE probably important we use different seeds here. For real
# sims just leave out the seed entirely, just setting it here
# for reproducibility.
random_seed=seed + 1,
end_time=split_time,
)
print(tsA.draw_text())
print(tsB.draw_text())
tablesA = tsA.dump_tables()
tablesB = tsB.dump_tables()
tablesB.edges.child += tsA.num_nodes
tablesB.edges.parent += tsA.num_nodes
tablesB.nodes.individual += tsA.num_individuals
individuals_dict = tablesB.individuals.asdict()
del individuals_dict["metadata_schema"]
tablesA.individuals.append_columns(**individuals_dict)
nodes_dict = tablesB.nodes.asdict()
del nodes_dict["metadata_schema"]
tablesA.nodes.append_columns(**nodes_dict)
edges_dict = tablesB.edges.asdict()
del edges_dict["metadata_schema"]
tablesA.edges.append_columns(**edges_dict)
tablesA.sort()
tablesA.build_index()
ts_joint = tablesA.tree_sequence()
tsC = msprime.sim_ancestry(
initial_state=ts_joint,
demography=demography,
recombination_rate=2.5e-8,
random_seed=seed + 2,
)
print(tsC) We should probably improve the API in tskit for "appending" a table to another, but this looks OK otherwise? |
nice, thanks Jerome! I didn't realise it was possible to modify columns of (lone) tables with stuff like this:
and yes, I agree that the bit at the end where the tables are converted in and out of dicts is likely to be the most confusing. It would also be nice to have the nodes relabelled so they conform to the usual time ordering. I'm happy to look at doing that properly, but hopefully this gives Milan and Marion something to start with in the meantime. |
also, users doing this^ requires them to have a fairly deep understanding of the table structures |
Could run simplify on it, that would rejigger things in the expected way.
Yes, we should have some functionality to make this easier either here or in tskit. |
This is perfect, a big thank you @jeromekelleher and @gtsambos |
Milan and Marion have inquired (in a private communication) about implementing population-specific recombination rates in msprime. I now have a few more technical thoughts about it, and I'll stick them here so there's a record for the future.
(I note also that it's not possible to do this in SLiM at present -- @bhaller, am I right here? Have you thought about this before, and if so, are there any more caveats we should be aware of?)
To see how recombination is currently implemented, have a look at the
dtwf_recombine
method inalgorithms.py
(here), and also at thedtwf_generate_breakpoint
method it references.As far as I can tell, here are the essential steps:
dtwf_generate_breakpoint
uses the recombination map to generate physical position for the breakpoints (here and at a few later stages too). This is continually updated, and functions as something like a "distance to the next recombination breakpoint".while
loop (here) to place recombination breakpoints inside these segments (here) or between them (here)Here's how we might extend this to multiple populations:
k
, we'd havek[i]
for each populationi
.while
loop, check which population segmentx
is from. Ifx.population == i
, we'd use and updatek[i]
in place ofk
.This might still have some little problems -- for instance, what if we need to insert a recombination breakpoint between two segments from different populations? However, I think this shouldn't be too big a deal unless you have lots of migration and admixture (in which case, this model of recombination is a bit silly anyway)
The text was updated successfully, but these errors were encountered: