Tools to perform clustering of elements based on pairwise distance/similarity measurements. The input is a list of pairwise distances or similarities, with the follwing format:
ElementX ElementY Distance/Similarity X-Y
If the distances/similarities are not reciprocal ( d(X-Y) != d(Y-X) ), the program will compute the harmonic average and use this value for the clustering. The program can perform several types of clustering:
- Single-linkage hierarchical clustering, stopping when it reaches a pair of elements is larger than a given cutoff.
- UPGMA clustering, a hierarchical clustering where the average distances between clusters is considered.
- Complete-linkage clustering, where the distance between each individual pair of elements in different clusters is considered.
- SPICKER clustering, where the element with the most neighbors within the cutoff is selected iteratively as a cluster center with all its neighbors as cluster members.
- K-means
The output is a list of clusters made below the cutoff that have not been merged into a new cluster yet. For each cluster, the clustroid element, radius, maximum distance between cluster elements and a list of members are reported.
- Allow for full Hierarchical clustering generation, generating a full dendogram
- Refactor the output generation, currently it is all crumped in main.cpp
- Refactor the clustering process
- Optimize the SPICKER code (follow comments on the code)