Skip to content

Performance test

trvinh edited this page Jan 25, 2018 · 22 revisions

We checked the performance of PhyloProfile using several datasets.

Starting from a phylogenetic profile of 1605 Microsporidia proteins across 489 species resulting in a data matrix with 784.845 cells. We then analysed partitions of increasing size of this data and monitored the time to load and visualize the data. In case 1, we kept the number of taxa the same and reduced the number of genes. In case 2, we kept the number of genes and reduce the number of taxa.

The calculated running times for each data set are depicted in Figure 1.

The time required for both importing and plotting the full data scales linearly with the input data size. Plotting of the first 30 genes is independent for the data size.

Figure 2 shows the RAM usage for the same test data sets used above.

Note, this trend is also the same for the online version running on the shinyapps.io webserver. We were able to read and plot data with up to a matrix size of 200.000 cells. However, the time for reading and processing input file is slightly slower (3-10 seconds) while the time to plot the data is comparable to that of the standalone version. For the matrix which has 400.000 data points, it needed up to 50 seconds (1,5 time slower than the standalone version) to read and was unable to plot the full gene set. The app kept disconnecting with the server and trying to re-establish again until it really disconnected. However, for plotting the first 30 genes, the online tool also needed only 4 second, the same as the standalone version.

We would like to stress that the online version runs on the shiny server that is provided as a service to the community by RStudio Inc. . We have no influence on this server, and thus the online version is meant only for small scale analyses.