Skip to content

Latest commit

 

History

History
132 lines (104 loc) · 7.24 KB

plots_YAML_500.md

File metadata and controls

132 lines (104 loc) · 7.24 KB

Comparison of Configurations I, II and III for YAML-500 data.

  • Configuration I: description and keywords.
  • Configuration II: all fields.
  • Configuration III: weighted fields.

Easy to interpret

Ball_Hall

  • The mean, through all the clusters, of their mean dispersion.
  • The range is [0,+∞).
  • The difference between two successive slopes to be maximized.
  • Best results: TT (wrt. optimal cluster number selection), LSA25 (wrt. global behavior: mean dispersion minimization), hca, conf II.

C_Index

  • Shows for a given clustering its fraction of maximal possible increase over minimal distances which within-cluster distances between pairs of points have.
  • The range is [0,1].
  • To be minimized.
  • Best results: TT or LSA25, hca, conf II.

Calinski-Harabasz

  • Is proportional to the quotient of the between-group dispersion and pooled within-cluster dispersion.
  • The range is [0,+∞).
  • To be maximized.
  • Best results: LSA25, kmedoids (hca comparable from a certain cluster number size), conf II.

McClain_Rao

  • Is the quotient of the mean within-cluster and between-cluster distance.
  • The range is [0,+∞).
  • To be minimized.
  • Best results: LSA25, kmeans / kmedoids / hca comparable in LSA25 , conf I and II are similar.

Ratkowsky_Lance

  • Is based on the mean of the quotients between between-group dispersion and TSS for each variable of the data.
  • The range is [0,+∞).
  • To be maximized.
  • Best results: TT or LSA25, hca, conf I and II are similar.

Trace_W

  • Is simply the pooled within-cluster dispersion.
  • The range is [0,+∞).
  • The difference between two successive slopes to be maximized.
  • Best results: LSA/LSA25 (wrt. global behavior: pooled within-cluster dispersion AND wrt. optimal cluster number selection), hca, conf II.

Wemmert_Gancarski

  • Is based on quotients of distances between the points and the barycenters of all the clusters.
  • The range is [0,1].
  • To be maximized.
  • Best results: LSA25, kmean or kmedoids (hca comparable wrt. global behavior), conf II.

Hard to interpret

Davies-Bouldin

  • Deals with those clusters which are “close” in terms of their barycenters to each other but have very distant points within.
  • The range is [0,+∞).
  • To be minimized.
  • Best results: LSA25, kmedoids / hca (hca shows more stable behavior in all models), conf II.

Ray_Turi

  • Is a quotient between two quantities: the mean of the squared distances from all the points to the barycenter of their cluster and the minimum of the squared distances between the cluster barycenters.
  • The range is [0,+∞).
  • To be minimized.
  • Best results: LSA25, hca, conf II.

Xie_Beni

  • Is the quotient between the mean pooled within-cluster dispersion and the minimum of the minimal squared distances between the points in the clusters.
  • The range is [0,+∞).
  • To be minimized.
  • Best results: BVSM (all models are very close in hca, i.p. LSA and BVSM), hca, conf II.

Silhouette

  • Operates with quantities that only depend on the average distances between a given observation and other observations inside its own and also inside the nearest cluster.
  • The range is [0,1].
  • To be maximized.
  • Best results: LSA25, hca, conf II (calculations will be redone later with another package)

Dunn

  • Deals with those clusters which contain the closest points belonging to different clusters and also with clusters that have very distant points within.
  • The range is [0,+∞).
  • To be maximized.
  • Best results: BVSM (all models, i.p. BVSM and LSA are relatively close and also in a quite small range regarding the value range [0,+∞) ), hca, conf II.