Skip to content

Calculates distances of trees in large collections. Embeds distances and plot the embeddings with dynamic interfaces.

License

Notifications You must be signed in to change notification settings

AndreaRubbi/Pear-EBI

Repository files navigation

Phylogeny Embedding &
Approximate Representation

Goldman Group - European Bioinformatics Institute

PEAR can:

  1. Compute the distance matrix given a set of phylogenetic trees;
  2. Embed and represent the distance matrix in 2D or 3D.

See also the autogenerated documentation and PyPI .

PEAR usage

Pear is both a python software and library. It can be installed with python -m pip install pear_ebi or downloaded from Github. Pear is currently compatible only with Linux.

PEAR as a python library

Once installed, Pear can be used to upload Newick trees in python and represent them in embedded spaces. We recommend to use it on either jupyter notebook or lab, as these tools allow for more interaction with the graphs. On these platforms, the user is allowed to interact with widgets that allows to modify several parameteres of the plots. For specific uses and applications, see the examples.

PEAR as a program

Run pear_ebi --help to see the complete list of arguments and flags.

Simple usage

pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF

this script calculates the unweighted Robinson Foulds distances between the trees in the file "beast_run1.trees", which contains 1001 phylogenetic trees.

the flag -m indicates the method used to compute the dissimilarity between phylogeneic trees. In this case, HashRF has been used.

To embed these distances in a lower-dimensional space, we can use PCoA (MDS) or tSNE:

pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF --pcoa 2

we therefore embedded the distance matrix in 2 dimensions. Using the flag -quality one can assess the correlation between the distances in the N-dimensional space and in the embedding.

pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF --pcoa 2 --plot

The flag -plot indicates that PEAR has to plot the embeddings and show them, respectively. If an embedding method is specified the plots are produced anyway. Plotting doesn't require any indication on the number of dimensions as the embeddings are represented in 2 dimensions if the distances are embedded in 2 dimensions, while it plots on 2 and 3 dimensions in any other case.

One can specify any number of files containing trees. Moreover, it is possible to specify a single directory using --dir, and possibly a pattern using --pattern, in order to select multiple files.

Tree Set

It's possible to compute the distance matrix and re-use it in subsequent runs of PEAR by specifying the distance matrix file with the flag -d. Additionally, it's possible to define the name of the output file (-o).

If any additional metadata is available, this may be specified by indicating a .csv file containing a dataframe of compatible shape.

Config file

A standard config toml file can be used for specific emebddings of multiple sets of trees. Instances of toml files are reported in the examples folder.

Using the config file allows one to use all the features of PEAR, including additional embedding methods and plot designs. The config file can also be used to specify lists of indexes of interesting trees in the sets, in order to highlight them in the final plots.

Interactive mode

pear_ebi -i : this script launches the program in the interactive mode. Once the program starts, it is going to guide you through its usage thanks to an intuitive interface.

Turorials and Examples

Follow this link for a complete set of basic and avanced guides and tutorials to use PEAR on the command line and as a python library.


Licensing

This project is released under the terms of the MIT Open Source License. View LICENSE.txt for more information.

About

Calculates distances of trees in large collections. Embeds distances and plot the embeddings with dynamic interfaces.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •