Removing isolated unary spans #3045
petrelharp
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Nodes in tree sequences can be unary over some of their span (ie, has only one child in the marginal tree). Some of these unary spans are adjacent to a span on which the node is coalescent (ie, not unary), and others are isolated, ie, not adjacent to a coalescent span. One can use
simplify
to remove all unary spans, and more fine-grained control is discussed in #2886. But what if you want to remove just the isolated, unary spans?The best way to do this would be to modify the
simplify
algorithm; currently thekeep_unary
option lets you keep or not keep all unary spans, but it could be modified to only keep non-isolated spans. But, here's another way to do it, with ingredients that might be useful for other things.The way this works is (1) iterates over the trees, writing down the isolated unary spans for each node; (2) iterates over the trees again, writing down this time for each node the isolated unary spans and which node they should be remapped to; and (3) iterating over all the edges and doing the remapping.
So, in step 1 we find the spans over which each node will be removed, and in step 2 we find which node they should be replaced with. For instance, perhaps we found that a node
n
was isolated and unary on[a, b)
; then on this second pass we might record that on[a, x)
its most recent, not-scheduled-to-be-removed ancestor isp1
; and on[x, b)
it isp2
.The code does not currently remap mutations, but that could be done in a similar way (like the code for
node_map
but we'd have to traverse down the tree).Here's a script that will read in a .trees file and write out a new .trees file with these removed.
Beta Was this translation helpful? Give feedback.
All reactions