Skip to content

phylogenetic graphs

Mark T. Holder edited this page Feb 14, 2015 · 12 revisions

Phylogenetic graph for the finding a supertree based on ranks

See supertrees using tree ranking page for some broader discussion of the role of a phylogenetic network. This page discusses how to construct a "phylogenetic graph" or "phylogenetic network" used in step 5 "build a phylogenetic graph for each subproblem" of a pipeline for producing a supertree is here..

input

The input is a ranked set of trees with a full tree from the taxonomy as the lowest ranked input.

Comments on mapping of tips

There is a proposal to remap tips to the deepest uncontested taxon that they examplify. So tips of the tree may be associated with higher taxa.

This page assumes that a this proposal or something similar is adpoted to prune any tip that was mapped by a curator to a higher taxon which is contested.

Thus we assume that, at this point in the pipeline, the inputs that are higher taxa are uncontested higher taxa.

preprocessing step: replace higher taxa with polytomies of terminal taxa.

Because we know that each of the tips mapped to a higher taxon is a case of a taxon that is going to be in the final answer, we can unproblematically create a set of inputs with no complexity of mapping to taxonomic groups by replacing such labels with polytomies include the terminal taxa. In fact, (because we know that the last input will bring in all of the leaves), it is safe to remap each leaf of the higher ranked trees as follows.

  1. create a list of all of the tips in the phylo inputs that are non-terminal taxa.
  2. partially sort the list such that, if any of these higher taxa are nested (based on the taxonomy) the shallower taxa come first.
  3. walk through this list remapping by: * A. find the union of terminal taxa used in the phylo (non taxonomy) trees. * B. if the set from A is empty, then choose one terminal taxon (arbitrarily) to stand in for the higher taxon * C. if the set from A is not empty, replace the tip with a polytomy of the set of terminals, but make an annotation on the branch that will indicate that this tree is not providing support for this grouping. Later when support for different branches are tabulated these expansions should not be included because thes are cases in which we use the taxonomy to assume that the group is monophyletic, even though the phylo input just has one tip.

the goal

Imagine that the score for a supertree, s, is a tuple:

( |c(s, N)| , |c(s, N-1)|, |c(s, N-2)|, ... |c(s, 0)| )  

where c(s,N) is the set of groups in the highest ranked input tree N, that are displayed on the supertree. |c(s,x)| indicates the size of the sets.

We would like to take these inputs and find the set of trees that has the highest score attainable if scores are computed lexicographically. So it is better to display "4 groupings from tree 4, and none from trees 0-3" than it is to display "3 groupings from tree 4, and all of the groupings from the lower ranked inputs".

The set of trees may be very large, so we would like to have a compact structure that represents the entire set. As discussed here, MTH conjectures that a phylogenetic graph will be able to accomplish this. This document will try to describe the rules for such graphs.

The summary of the supertree strategy is: walk through ranked list of trees. For each tree, walk through each grouping. For each grouping decide if it is compatible with the current phylogenetic graph or not. If it is, update the graph. If the information from the grouping is not redundant with previously constructed information, then this will change the structure of the graph. At minimum it will entail making note of the association between the graph edge and the input edge.

If the grouping is not compatible, make note of that split as an undisplayed split, but do not alter the graph.

Properties

  1. The edges are directed.
  2. there are no directed cycles.
  3. There are 3 types of edges: * unlooped edge type: These are the edges that are most like a directed edge in a phylogenetic tree. If the set of trees can be displayed in a consensus tree, then each edge makes a group of phylogenetic statements on the full set of leaves. These edges make similar statements about all of the leaves in their connected component of the graph. * looped edges: these edges create "bubbles" in the graph. They can be used to make phylogenetic statements about a subset of the full set of leaves. * dangling edges: if a connected component of the graph has multiple roots then each edge that is has only one root as its ancestor is a dangling edge.

Some properties of these graphs when they are used in the suggested pipeline.

  1. because of the tree ranking, the graph will be start as a phylogenetic tree identical to the highest ranked tree.
  2. the lowest ranked taxonomy or the structure of the inputs created by the divide-and-conquer stage will assure that all of the dangling edges will be converted to looped edges or unlooped edges at the end of the procedure. (MTH: I think that if we can't guarantee this, then we should be subdividing more until we can).
Clone this wiki locally