Skip to content

Performance test

Vinh Tran edited this page Apr 12, 2019 · 10 revisions

We checked the performance of PhyloProfile with increasing data size.

In brief, the time required for both importing and plotting the full data (Figure 1), and RAM usage (Figure 2) scales linearly with the size of the data matrix. Plotting of the first 30 genes (default setting; cf point 2. below) is independent of the data size. The phylogenetic profile of a moderate sized data set comprising 200 genes and 200 species (40,000 cells) takes about 10 seconds to display, both on the standalone version and on the online version.

In detail, we assessed the performance of PhyloProfile on a locally installed version using a Macbook Pro CPU core i7 2.8ghz, 8gb ram. As test data served the phylogenetic profiles of 1,605 microsporidian proteins across 489 species. The full data matrix comprises 784,845 cells. It takes about 70 seconds to load the data and about 180 seconds to plot the entire matrix. We then reduced the data matrix stepwise by either considering fewer genes (Fig. 1a) or fewer taxa (Fig. 1b), and measured the time to upload and plot the data.

Figure 1. The running time of PhyloProfile for uploading (yellow) and plotting phylogenetic profiles of all (green) or the first 30 genes (red) scales linearly with data size. (a) Running time as a function of number of genes analyzed. (b) Running time as a function of number of taxa analyzed.

The results indicate that PhyloProfile facilitates a reasonably quick interactive exploration of the data for data comprising up to a few hundreds of genes and taxa. We trust that this will be sufficient for the vast majority of applications, as we expect that a typical user will be interested in exploring phylogenetic profiles of gene sets representing, e.g. one ore few KEGG pathways. However, the analysis of substantially larger data is also possible, and the option to extract subsets of interest via the customized profile option allows to streamline and speed up the analysis.

Figure 2. RAM usage during data display increases linearly as the data matrix grows. (a) RAM usage as a function of number of genes analyzed, and (b) as a function of the number of taxa analyzed.


The online version of PhyloProfile currently runs on the shinyapps.io webserver that is provided as a service to the community by RStudio Inc. The performance of the online version is comparable to the standalone version with respect to speed of data upload and plotting of the profiles. However, we would like to emphasize that the online version is meant for small to moderate size analyses. Still, we could upload and plot data up to a matrix size of 200.000 cells. With larger data sets, the server starts disconnecting. For a regular use of PhyloProfile with larger data sets, we encourage the user to download and install PhyloProfile locally, which is straightforward even to uninitiated users, and we provide an installation guide online. We are moving the online version of PhyloProfile into our internal webserver in order to overcome the limitation of the shinyapps.io server. A new performance test will be done in the near future.