Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
img1.png		img1.png
img2.png		img2.png
kmeans.py		kmeans.py
output.png		output.png

README.md

KMeans

KMeans is an unsupervised algorithm whose goal is to partite a dataset into a fixed number of clusters.

It does this by iteratively assigning each data point to the cluster with the closest mean (centroid) and then updating the cluster centroid to the mean of all data points assigned to it.

The algorithm runs until the centroids do not change after the update, or once it reaches the maximum number of iterations.

source	source

📍 The full implementation is available here!

Demo

Import sklearn in order to create a random dataset.

from sklearn.datasets import make_blobs

import matplotlib.pyplot as plt
import seaborn as sns

Create the dataset via make_blobs. This created 1000 samples, having three approximate clusters and two features.

X, y = make_blobs(n_samples=1000, centers=3, n_features=2,random_state=0)

Make inference over all the available data.

kmeans = KMeans(n_clusters=3, random_state=42)
y_pred = kmeans.fit_predict(X)

Notice that most of the times we do not know the real number of clusters, but this is just a toy example.

fig, axs = plt.subplots(1,2,figsize=(10,4))

sns.scatterplot(x=X[:,0],y=X[:,1],hue=y_pred, ax=axs[0], palette="pastel")
axs[0].set_title("KMeans")
axs[0].set_xlabel("x1")
axs[0].set_ylabel("x2")

sns.scatterplot(x=X[:,0],y=X[:,1],hue=y, ax=axs[1], palette="pastel")
axs[1].set_title("True clusters")
axs[1].set_xlabel("x1")
axs[1].set_ylabel("x2")

plt.show()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kmeans

kmeans

README.md

KMeans

Demo

Files

kmeans

Directory actions

More options

Directory actions

More options

Latest commit

History

kmeans

Folders and files

parent directory

README.md

KMeans

Demo