-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorial: explain how to represent ARGs #43
Comments
Reminder to myself: I wrote a presentation about this for the PopSim workshop which could be adapted for a first pass on this. Some good examples in there which would be a good starting point. |
Could you possibly post the presentation, or a link to it, in this issue @jeromekelleher ? |
Hello, just adding a suggestion as discussed with @hyanwong . It would be handy to see how to extract the number of recombination events from the ARG representation, and if possible split these into those that are undetectable / change only the branch lengths / are detectable with the right mutations. This would be useful for assessing the performance of ARG inference methods. |
Here it is: arg-ts-presentation.pdf This was the presentation I gave at the PopSim consortium meeting in Aussois, Oct 2019, so some of the information will be a little out of date. |
Hi @a-ignatieva, welcome! 👋 I'm sure we can figure out how to compute the things you're interested in. Here's an example: ts = msprime.sim_ancestry(
2, recombination_rate=0.1, sequence_length=8,
record_full_arg=True, random_seed=1234)
print(ts.draw_text())
num_re_nodes = 0
for node in ts.nodes():
if node.flags == msprime.NODE_IS_RE_EVENT:
num_re_nodes += 1
# Divide by 2 because each RE event has *two* nodes
print("num RE events = ", num_re_nodes / 2)
print(ts.tables.nodes) giving
So I guess the RE event in nodes 14/15 is undetectable because the trees below both they just change the branch length, but the event in 8/9 is detectable because the trees change "above" these nodes? (I'm just trying to get a precise idea of what the condition is so we can write down an efficient way to detecting the events in question) |
I think here the only recombination changing the unrooted topology is 4/5, while 8/9, 12/13 and 14/15 keep the unrooted topologies the same but change the branch lengths (and the third type would be where neither the topology nor the branch lengths change for local trees around the breakpoint). Is there anything ready-made in tskit for "gluing" these tree sequences into an ARG? Just wondering, as sometimes it's nice to see what's going on by looking at the graph (though easy to do by hand for small examples). |
It would be nice to have a way to "convert" (or perhaps simply display) the tree-sequence-with-recombination-nodes as a graph. This is not dissimilar to the animations I did with record_full_arg simulations (below) Incorporating this into tskit itself would mean solving tskit-dev/tskit#1209 so that tskit had an idea of what a recombination node meant, rather than it being only a specific extension provided by msprime. We might, however, be able to throw together some code that would work on an msprime-produced TS. For larger args, to avoid lots of crossing lines, this might require some sort of graph layout package. Not only is that rather a burdensome dependency, but we have had problems locating a suitable one (see discussions at tskit-dev/tskit#621). @a-ignatieva - do you use any specific viz packages for looking at ARG graphs? SPR.mov |
Also, have you tried just feeding the edges into e.g. graphviz? I'm not sure if that would work: perhaps you can experiment? You can get the parent and child node IDs for each edge very easily:
|
I just use graphviz (dot), as KwARG doesn't care about the y-positions of the nodes. |
Hmmm. So, one way we can get at topology changes is to look at the tree ranks in the simplified tree sequence sts = ts.simplify()
for tree in sts.trees():
print(tree.draw_text())
print(tree.interval, tree.rank()) giving
So, we have two distinct topologies, which change at position 4, and this corresponds to the RE node 4/5. Would this definition work for you @a-ignatieva? If so, I'm sure we can work backwards from knowing where the RE event occured along the sequence to pin-pointing it to the event itself. |
Yes, that works I think! Though I guess is it complicated when we have two recombination events with the same breakpoint? Eg it looks like 12/13 could also have a breakpoint at position 4 here. |
Yes, exactly. You're right that things get more complicated if there are multiple recombinations at the same genome position, but I think that's something we assume is negligible in most models. It depends on exactly what you're looking for - if we just want a count of the detectable recombinations, then I think the number of distinct topologies is as good a definition as you'll get (?). Pinpointing the actual events that give detectable topology changes will make use do a bit more work, but I'm sure we could figure it out. |
I'm happy with just having a count, so this works for me! Thanks. |
When you are plotting out using Graphviz, or whatever, do you somehow merge the 2 corresponding recombination nodes into a single graphviz node, @a-ignatieva ? Or do you have them as 2 separate ones, and have the recombination node below? Have you got it working, with an image you could post here? |
This is a bit tidier, merging pairs of recombination nodes together: from graphviz import Digraph
dot = Digraph(strict=False)
# Add sample nodes
with dot.subgraph() as s:
s.attr(rank = 'same')
for n in ts.samples():
s.node(str(n), **{'shape': 'doublecircle'})
# Internal nodes
itr = iter(range(ts.num_samples, ts.num_nodes))
for i in itr:
if ts.node(i).flags == msprime.NODE_IS_RE_EVENT:
# Only add one of the recombination nodes and make the other one invisible
with dot.subgraph() as s:
s.attr(rank = 'same')
s.node(str(i), **{'shape': 'rect'}, label = str(i) + "/" + str(i+1))
s.node(str(i + 1), **{'style': 'invis'})
next(itr, None)
else:
dot.node(str(i), **{'shape': 'circle'})
# Add invisible edges to fix the ranks
dot.edge(str(i), str(i-1), **{'style': 'invis'})
# Add edges
check = [0,0]
for e in ts.edges():
ch = e.child
pa = e.parent
# Preventing duplicate edges
if check == [pa, ch]:
continue
check = [pa, ch]
# Check which edge to draw if there is a recombination
if e.parent >= 1 and ts.node(pa).time == ts.node(pa-1).time and ts.node(pa).flags == ts.node(pa-1).flags == msprime.NODE_IS_RE_EVENT:
ch = pa = -1
elif e.child >= 1 and ts.node(ch).time == ts.node(ch-1).time and ts.node(ch).flags == ts.node(ch-1).flags == msprime.NODE_IS_RE_EVENT:
ch -= 1
lab = ''
if ch >= 0 and ts.node(ch).flags == msprime.NODE_IS_RE_EVENT:
lab = "[" + str(int(e.left)) + "," + str(int(e.right)) + ")"
if(ch >= 0 and pa >= 0):
dot.edge(str(pa), str(ch), label=lab)
dot.render('pic', format='png', view=True) |
Oh yes, really nice. Thanks @a-ignatieva . We should have something like that in the ARG tut I think. |
Yes, this is really nice, thanks @a-ignatieva. |
@a-ignatieva: you might be interested in my recent PR: tskit-dev/tskit#1296. This should allow us to (easily) import a tree sequence into the Python graph software NetworkX. Not sure if this actually helps with viz, but I guess it might be useful for other things? I'd be interested to see how it copes with msprime's |
A simple example from graphviz and a more complex one of reading it back into networkx are now at https://tskit.dev/tutorials/viz.html#graph-representations. The ARG tutorial contains the code for networkx conversion (https://tskit.dev/tutorials/args.html#sec-args-other-analysis) so I think we can close this issue. |
We can simulate some ARGs in msprime, explain how we can losslessly represent ARGs and get all the information you could want.
The text was updated successfully, but these errors were encountered: