Computer ScienceAI & Machine LearningMedium

K-Means Clustering

Also known as:Lloyd's AlgorithmK-Means Algorithm

K-means clustering is an unsupervised machine learning algorithm that partitions n data points into k clusters by iteratively assigning each point to the nearest cluster centroid and recalculating centroids until convergence. The objective is to minimise the within-cluster sum of squared distances (inertia). It is widely used for customer segmentation, image compression, anomaly detection, and data summarisation.

Key Formula

Inertia J = sum over clusters j of sum over points x in cluster Cⱼ of ||x - μⱼ||²

LaTeX: J = \sum_{j=1}^{k} \sum_{\mathbf{x}_i \in C_j} \|\mathbf{x}_i - \boldsymbol{\mu}_j\|^2

SymbolMeaningUnit
JWithin-cluster sum of squared distances (inertia)squared distance units
kNumber of clusterscount
C_jSet of points assigned to cluster jdimensionless
\boldsymbol{\mu}_jCentroid (mean) of cluster jsame as data
\mathbf{x}_iData point isame as data

Worked Example

Problem

Given 6 one-dimensional points: {2, 4, 10, 12, 3, 11}. Run one iteration of K-means with k=2 and initial centroids μ₁=2, μ₂=10.

Solution

Step 1 — Assignment: Distances from μ₁=2: |2-2|=0, |4-2|=2, |10-2|=8, |12-2|=10, |3-2|=1, |11-2|=9. Distances from μ₂=10: |2-10|=8, |4-10|=6, |10-10|=0, |12-10|=2, |3-10|=7, |11-10|=1. Cluster 1 = {2, 4, 3}, Cluster 2 = {10, 12, 11}. Step 2 — Update centroids: μ₁ = (2+4+3)/3 = 3. μ₂ = (10+12+11)/3 = 11. Step 3 — New inertia: C₁: (2-3)²+(4-3)²+(3-3)² = 1+1+0 = 2. C₂: (10-11)²+(12-11)²+(11-11)² = 1+1+0 = 2. Total J = 4.

Answer

After one iteration: Cluster 1 = {2, 3, 4} with centroid 3; Cluster 2 = {10, 11, 12} with centroid 11; Inertia = 4

K-Means Algorithm Steps

StepActionGoalNotes
1. InitialiseChoose k centroids randomly (or K-means++)Starting positionsK-means++ reduces bad starts
2. AssignEach point → nearest centroidForm clustersEuclidean distance default
3. UpdateRecompute centroids as cluster meansMove centresInertia decreases
4. RepeatRepeat steps 2–3 until convergenceStable clustersTypically < 100 iterations
5. EvaluateCompute inertia or silhouette scoreQuality checkElbow method for choosing k

Interactive Tools

Scikit-learn K-Means

Open Tool

Naftali Harris K-Means Visualisation

Open Tool

Brilliant.org Clustering

Open Tool
Animation showing K-means clustering converging over successive iterations

Wikimedia Commons, CC BY-SA

Related Terms

The term "k-means" was coined by James MacQueen in 1967, with "k" representing the number of clusters and "means" referring to the arithmetic mean used to compute centroids. An equivalent algorithm was independently proposed by Forgy in 1965.

clusteringunsupervised-learningk-meanscentroidsegmentation