-
Notifications
You must be signed in to change notification settings - Fork 26
phylogenetic graphs
See supertrees using tree ranking page for some broader discussion of the role of a phylogenetic network. This page discusses how to construct a "phylogenetic graph" or "phylogenetic network" used in step 5 "build a phylogenetic graph for each subproblem" of a pipeline for producing a supertree is here..
The input is a ranked set of trees with a full tree from the taxonomy as the lowest ranked input.
There is a proposal to remap tips to the deepest uncontested taxon that they examplify. So tips of the tree may be associated with higher taxa.
This page assumes that a this proposal or something similar is adpoted to prune any tip that was mapped by a curator to a higher taxon which is contested.
Thus we assume that, at this point in the pipeline, the inputs that are higher taxa are uncontested higher taxa.
Because we know that each of the tips mapped to a higher taxon is a case of a taxon that is going to be in the final answer, we can unproblematically create a set of inputs with no complexity of mapping to taxonomic groups by replacing such labels with polytomies include the terminal taxa. In fact, (because we know that the last input will bring in all of the leaves), it is safe to remap each leaf of the higher ranked trees as follows.
- create a list of all of the tips in the phylo inputs that are non-terminal taxa.
- partially sort the list such that, if any of these higher taxa are nested (based on the taxonomy) the shallower taxa come first.
- walk through this list remapping by: * A. find the union of terminal taxa used in the phylo (non taxonomy) trees. * B. if the set from A is empty, then choose one terminal taxon (arbitrarily) to stand in for the higher taxon * C. if the set from A is not empty, replace the tip with a polytomy of the set of terminals, but make an annotation on the branch that will indicate that this tree is not providing support for this grouping. Later when support for different branches are tabulated these expansions should not be included because thes are cases in which we use the taxonomy to assume that the group is monophyletic, even though the phylo input just has one tip.
Imagine that the score for a supertree, s, is a tuple:
( |c(s, N)| , |c(s, N-1)|, |c(s, N-2)|, ... |c(s, 0)| )
where c(s,N) is the set of groups in the highest ranked input tree N, that are displayed on the supertree.
|c(s,x)|
indicates the size of the sets.
We would like to take these inputs and find the set of trees that has the highest score attainable if scores are computed lexicographically. So it is better to display "4 groupings from tree 4, and none from trees 0-3" than it is to display "3 groupings from tree 4, and all of the groupings from the lower ranked inputs".
The set of trees may be very large, so we would like to have a compact structure that represents the entire set. As discussed here, MTH conjectures that a phylogenetic graph will be able to accomplish this. This document will try to describe the rules for such graphs.
The summary of the supertree strategy is: walk through ranked list of trees. For each tree, walk through each grouping. For each grouping decide if it is compatible with the current phylogenetic graph or not. If it is, update the graph. If the information from the grouping is not redundant with previously constructed information, then this will change the structure of the graph. At minimum it will entail making note of the association between the graph edge and the input edge.
If the grouping is not compatible, make note of that split as an undisplayed split, but do not alter the graph.
- The edges are directed.
- there are no directed cycles.
- There are 2 sorts of adjectives types of edges: looped vs unlooped. dangling vs rooted. So there are four types of edges. All edges to terminals are unlooped. * unlooped edge type: These are the edges that are most like a directed edge in a phylogenetic tree. If the set of trees can be displayed in a consensus tree, then each edge makes a group of phylogenetic statements on the full set of leaves. These edges make similar statements about all of the leaves in their connected component of the graph. * looped edges: these edges create "bubbles" in the graph. They can be used to make phylogenetic statements about a subset of the full set of leaves. * dangling edges: if a connected component of the graph has multiple roots then each edge that whose ancestors include only a proper subset of all of roots is considered to be a dangling edge. In some sense, a dangling edge is a type of looped edge.
Every node in the graph has a unique set of taxa descended from it with the exception of the hybrid nodes created to close a the terminal end of a loop. These nodes may have a set of descendants that is identical to their child node.
- because of the tree ranking, the graph will be start as a phylogenetic tree identical to the highest ranked tree.
Consider an example with 3 ranked inputs. The highest priority tree is:
The second ranked tree is:
and the tree with the lowest priority is: