This repository contains a Python implementation of the K-Means clustering algorithm. K-Means is a widely used unsupervised machine learning algorithm for partitioning a dataset into a specified number (k) of clusters, based on similarity. This implementation utilizes Python 3 and Numpy for the numerical operations.
kmeans.py
This file contains the main implementation of the KMeansModel
class, which encapsulates the K-Means clustering logic.
utils.py
This file provides utility functions used in the K-Means implementation. These functions include normalization, resizing, and centroid generation.
main.ipynb
Jupyter Notebook demonstrating the usage of the K-Means algorithm. It serves as a visual guide and provides insights into the clustering process.
from kmeans import KMeansModel
import pandas as pd
# Load your data into a pandas DataFrame (replace this with your own data)
data = pd.read_csv("your_data.csv")
# Instantiate the KMeansModel
model = KMeansModel()
# Cluster the data
centroids, clusters = model.cluster(data, n_clusters=4, n_iter=10)
n_clusters
: The number of clusters the algorithm partitions the dataset into.n_iter
: The number of iterations the algorithm goes through to iteratively update cluster assignments and centroids.
Adjusting these parameters allows you to control the granularity of clustering and the convergence of the algorithm. Experimenting with different values can impact the quality and efficiency of the clustering results.