Computer ScienceAI & Machine LearningMedium

Regularization (ML)

Also known as:Weight DecayShrinkagePenalised Regression

Regularization in machine learning refers to techniques that add a penalty term to the loss function to discourage model complexity, thereby reducing overfitting and improving generalisation to unseen data. The two most common forms are L1 (Lasso) regularization, which promotes sparsity by penalising the absolute values of weights, and L2 (Ridge) regularization, which penalises the squared values, shrinking all weights toward zero. Regularization is a fundamental concept in statistical learning theory, closely tied to the bias–variance trade-off.

Key Formula

Regularised loss = Data loss + λ × Ω(w), where Ω(w) = ||w||₁ (L1) or ||w||₂² (L2)

LaTeX: L_{\text{reg}} = L_{\text{data}} + \lambda \Omega(\mathbf{w})

Symbol	Meaning	Unit
L_{\text{reg}}	Total regularised loss function	dimensionless
L_{\text{data}}	Data fidelity loss (e.g. cross-entropy, MSE)	dimensionless
\lambda	Regularisation strength hyperparameter (λ > 0)	dimensionless
\Omega(\mathbf{w})	Regularisation penalty: \|\|w\|\|₁ for L1, \|\|w\|\|₂² for L2	dimensionless
\mathbf{w}	Model weight vector	dimensionless

Worked Example

Problem

A linear regression model has weights w = [3, −2, 0.5] and a data loss of 4.2. Calculate the L2-regularised loss with λ = 0.1.

Solution

Step 1 — Compute L2 penalty: ||w||₂² = 3² + (−2)² + 0.5² = 9 + 4 + 0.25 = 13.25. Step 2 — Multiply by λ: 0.1 × 13.25 = 1.325. Step 3 — Add to data loss: L_reg = 4.2 + 1.325 = 5.525.

Answer

L2-regularised loss = 5.525

L1 vs L2 vs Elastic Net Regularization Comparison

Property	L1 (Lasso)	L2 (Ridge)	Elastic Net
Penalty term	\|\|w\|\|₁ = Σ\|wᵢ\|	\|\|w\|\|₂² = Σwᵢ²	α\|\|w\|\|₁ + (1-α)\|\|w\|\|₂²
Weight behaviour	Drives many weights to exactly 0	Shrinks weights toward 0	Sparse + shrinkage
Feature selection	Yes (automatic)	No	Partial
Correlated features	Selects one arbitrarily	Distributes evenly	Better handling
Solution uniqueness	Not always unique	Always unique	Always unique
Use case	High-dim sparse data	Multicollinearity	General-purpose

Interactive Tools

Scikit-learn Regularization

Open Tool

Desmos Regularization Explorer

Open Tool

Brilliant.org Overfitting and Regularization

Open Tool

Wikimedia Commons, CC BY-SA

Related Terms

Computer Science

Feature Engineering

Feature engineering is the process of using domain knowledge to select, transform, or create input variables (features) from raw data to improve the performance of machine learning models. It bridges raw data and predictive algorithms by producing representations that algorithms can learn from more effectively. Techniques include normalization, one-hot encoding, polynomial feature creation, and dimensionality reduction.

Computer Science

Decision Tree (ML)

A decision tree is a supervised machine learning model that splits data into branches based on feature values, forming a tree structure where each internal node represents a feature test, each branch represents an outcome, and each leaf node holds a prediction. Trees are trained by choosing splits that maximise information gain or minimise Gini impurity at each step. They are highly interpretable and serve as the building block for ensemble methods like random forests and gradient boosting.

Computer Science

Support Vector Machine

A support vector machine (SVM) is a supervised learning algorithm that finds the optimal hyperplane separating two classes by maximising the margin between the nearest data points of each class, called support vectors. For non-linearly separable data, the kernel trick implicitly maps inputs into a higher-dimensional space where a linear separator exists. SVMs are effective in high-dimensional spaces and are used for classification, regression (SVR), and outlier detection.

The statistical concept of regularization traces to Andrey Tikhonov's 1963 work on ill-posed problems (Tikhonov regularization = L2). The ML usage was popularised through the work of Robert Tibshirani (Lasso, 1996) and widespread adoption in the 2000s. From Latin regularis (conforming to a rule).

regularizationoverfittingbias-variance-tradeofflassoridgesupervised-learning