-
Notifications
You must be signed in to change notification settings - Fork 26
supertree via ranked trees on a taxo scaffold
This page describes an alternative supertree method summarized by:
- taxonomic groups that are not contested by any input must be in the final supertree,
- after honoring that constraint, we attempt to find supertree which maximizes the "max weight of input tree edges displayed" objective function using tree-based ranks to supply the weights.
This doc builds off of the supertrees using tree ranking doc; so that is considered background reading.
It seems that procedure for finding an optimal to the "max weight of input tree edges displayed" criterion may not scale well
- even when using our crude tree-based ranks.
MTH recently realized that:
- the set of optimal phylogenetic graphs (during the step of adding each new tree) may require the use of a set of phylogenetic graphs rather than just one. (each member of the set would need to be built upon until it was obvious that it was sub-optimal),
- the presence of an taxonomic group that is not contested by any input is not sufficient to guarantee that the group will be present the final trees. This complicates the requirements for the "divide" step of the divide-and-conquer method.
- the order which you choose to add group from one tree affects the output graph. So an exact method would need to follow up each possibility.
A high priority desirable property for a method for producing the final tree is that it should make it easy to explain why groups are displayed and how someone from the community of systematic biologists could fix a spuriuos grouping by correcting an input tree or adding another input tree.
Making the constraint that "taxonomic groups that are not contested by any input must be in the final supertree":
- is very easy to explain,
- makes an obvious fix apparent (if you know genus X is not monophyletic, please add a tree that you know of which contests its monophyly).
- makes the construction of the tree easier.
The inputs are the same as the other pipeline.
Any tip not mapped to OTT cannot be used (as before). The method described below may not require the pruning of repeated or nested labels. However, nested taxa being treated as tips of the same tree should almost certainly be treated as a curation error.
short term: continure starting with our treemachine-processed trees from draft synthesis v2. These inputs have redundantly-mapped, unmapped, and nesting mapping pruned.
As before, we think that we can place any "taxonomy-only" terminal taxa as a last step in the pipeline. So an early pre-processing step is to produce a much reduce taxonomy (it is about 1% of the total number of tips) by pruning off any tips that are not represented by any phylogenetic input).
more below
(those pruned from step 2)
Just because it easier to navigate and serve a tree...
Following this page:
-
T
will be the taxonomy. -
t
will be a input tree. -
L(t)
is the label set of the treet
-
L(t, n)
for a noden
will be the label set of the subtree oft
that descends fromn
-
L(M)
for a nodeM
in the taxonomy will be the labels under this node. -
E(t)
is the set of edges in treet
-
I
is the set of input trees
self-explanatory.
Here we map each node of an input to a node of the taxonomy and we create "path pairings".
The mapping of source tree node n
to taxonomic node M
is accomplished by:
- if
n
is a leaf, thenL(t,n)
is guaranteed to be a taxon. SoM
is defined to be the node of the taxonomy (not necessarily a leaf) for whichL(t,n) = L(M)
. (Note we'll need a rule for handling Gregg's paradox; with some extra bookeepping to make sure that we realize that many artifactual nodes-of-outdegree=1 are produces by pruning in step 2. Using the taxonomic node furthest to the root in the case of multiple nodes that match this definition should work.) - if
n
is not a leaf, then theM
is the least-inclusive taxon (LIT) which includes all members ofL(t,n)
.
This implies that every edge in the input tree can be mapped to a path in the taxonomy. The data structure that stores this mapping is a "PathPair":
class PathPair {
Node * inputTreeChildNode;
Node * inputTreeParentNode;
Node * taxoChildNode;
Node * taxoParentNode;
bool edgeIsTrivialInInputTree; // true if inputTreeChildNode is a tip
};
Note that inputTreeParentNode
will always be the parent of inputTreeChildNode
(because in the source tree, the "path"
is just an edge).
taxoChildNode
may be identical to taxoParentNode