K-means clustering is an unsupervised machine learning algorithm that partitions n data points into k clusters by iteratively assigning each point to the nearest cluster centroid and recalculating centroids until convergence. The objective is to minimise the within-cluster sum of squared distances (inertia). It is widely used for customer segmentation, image compression, anomaly detection, and data summarisation.
Inertia J = sum over clusters j of sum over points x in cluster Cⱼ of ||x - μⱼ||²
LaTeX: J = \sum_{j=1}^{k} \sum_{\mathbf{x}_i \in C_j} \|\mathbf{x}_i - \boldsymbol{\mu}_j\|^2
| Symbol | Meaning | Unit |
|---|---|---|
| J | Within-cluster sum of squared distances (inertia) | squared distance units |
| k | Number of clusters | count |
| C_j | Set of points assigned to cluster j | dimensionless |
| \boldsymbol{\mu}_j | Centroid (mean) of cluster j | same as data |
| \mathbf{x}_i | Data point i | same as data |
Problem
Given 6 one-dimensional points: {2, 4, 10, 12, 3, 11}. Run one iteration of K-means with k=2 and initial centroids μ₁=2, μ₂=10.
Solution
Step 1 — Assignment: Distances from μ₁=2: |2-2|=0, |4-2|=2, |10-2|=8, |12-2|=10, |3-2|=1, |11-2|=9. Distances from μ₂=10: |2-10|=8, |4-10|=6, |10-10|=0, |12-10|=2, |3-10|=7, |11-10|=1. Cluster 1 = {2, 4, 3}, Cluster 2 = {10, 12, 11}. Step 2 — Update centroids: μ₁ = (2+4+3)/3 = 3. μ₂ = (10+12+11)/3 = 11. Step 3 — New inertia: C₁: (2-3)²+(4-3)²+(3-3)² = 1+1+0 = 2. C₂: (10-11)²+(12-11)²+(11-11)² = 1+1+0 = 2. Total J = 4.
Answer
After one iteration: Cluster 1 = {2, 3, 4} with centroid 3; Cluster 2 = {10, 11, 12} with centroid 11; Inertia = 4
| Step | Action | Goal | Notes |
|---|---|---|---|
| 1. Initialise | Choose k centroids randomly (or K-means++) | Starting positions | K-means++ reduces bad starts |
| 2. Assign | Each point → nearest centroid | Form clusters | Euclidean distance default |
| 3. Update | Recompute centroids as cluster means | Move centres | Inertia decreases |
| 4. Repeat | Repeat steps 2–3 until convergence | Stable clusters | Typically < 100 iterations |
| 5. Evaluate | Compute inertia or silhouette score | Quality check | Elbow method for choosing k |
Wikimedia Commons, CC BY-SA
Feature engineering is the process of using domain knowledge to select, transform, or create input variables (features) from raw data to improve the performance of machine learning models. It bridges raw data and predictive algorithms by producing representations that algorithms can learn from more effectively. Techniques include normalization, one-hot encoding, polynomial feature creation, and dimensionality reduction.
Natural language processing (NLP) is a field of artificial intelligence focused on enabling computers to understand, interpret, and generate human language in a useful way. It combines computational linguistics with machine learning and deep learning to process text and speech data. Core tasks include tokenisation, named entity recognition, sentiment analysis, machine translation, and question answering.
A support vector machine (SVM) is a supervised learning algorithm that finds the optimal hyperplane separating two classes by maximising the margin between the nearest data points of each class, called support vectors. For non-linearly separable data, the kernel trick implicitly maps inputs into a higher-dimensional space where a linear separator exists. SVMs are effective in high-dimensional spaces and are used for classification, regression (SVR), and outlier detection.
The term "k-means" was coined by James MacQueen in 1967, with "k" representing the number of clusters and "means" referring to the arithmetic mean used to compute centroids. An equivalent algorithm was independently proposed by Forgy in 1965.