You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DBSCAN is a clustering method that can identify outliers. I expect these outliers to be clearly indicated in some way. I also expect that outliers are treated properly in similarity measures.
Actual behavior
The method implemented in MDAnalysis makes the outlier group (label==-1) look like an actual cluster (labels start from 0). (although this doesn't ultimately matter, as encode_centroid_info drops label information anyway)
Also, calling the first frame in the cluster the centroid, and not mentioning this very clearly in the docs seems like a bad idea. This also gives the outlier group a centroid.
Finally, ClusterCollection does not keep the cluster labels. This makes it hard to look for special (i.e. negative) cluster labels.
Currently version of MDAnalysis
Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") 0.20.2-dev
Which version of Python (python -V)?
Which operating system?
Possible fix
Easy option
Don't alter DBSCAN's output
Add a warning and note in the docs that the "centroid" is the first frame of that cluster
Figure out first frame in the outlier group (which becomes the "centroid" in ClusterCollection) and add a warning that it's not a real cluster
More work option
Reconstruct the ClusterCollection class with a more intuitive interface
each cluster should not require a centroid
each cluster should retain its label from scipy
should be able to label a "cluster" as outliers
it would be nice to link the cluster members to frames of the universes in the ensemble
The text was updated successfully, but these errors were encountered:
This also results in issues for ensemble similarity analysis. The outlier "cluster" is treated like a real cluster. Therefore, if a conformation in trajectory A is in the outlier cluster and a conformation in trajectory B is in the outlier cluster, it is treated as a point of similarity -- in reality these conformations should be unrelated.
Expected behavior
DBSCAN is a clustering method that can identify outliers. I expect these outliers to be clearly indicated in some way. I also expect that outliers are treated properly in similarity measures.
Actual behavior
The method implemented in MDAnalysis makes the outlier group (label==-1) look like an actual cluster (labels start from 0). (although this doesn't ultimately matter, as
encode_centroid_info
drops label information anyway)https://github.com/MDAnalysis/mdanalysis/blob/9bcf6f4c118e1ea137e8514bd60cbd1cd1972062/package/MDAnalysis/analysis/encore/clustering/ClusteringMethod.py#L300-L306
Also, calling the first frame in the cluster the centroid, and not mentioning this very clearly in the docs seems like a bad idea. This also gives the outlier group a centroid.
Finally, ClusterCollection does not keep the cluster labels. This makes it hard to look for special (i.e. negative) cluster labels.
Currently version of MDAnalysis
python -c "import MDAnalysis as mda; print(mda.__version__)"
) 0.20.2-devpython -V
)?Possible fix
Easy option
ClusterCollection
) and add a warning that it's not a real clusterMore work option
The text was updated successfully, but these errors were encountered: