Computer ScienceAI & Machine LearningAdvanced

Support Vector Machine

Also known as:SVMSupport Vector ClassifierSVC

A support vector machine (SVM) is a supervised learning algorithm that finds the optimal hyperplane separating two classes by maximising the margin between the nearest data points of each class, called support vectors. For non-linearly separable data, the kernel trick implicitly maps inputs into a higher-dimensional space where a linear separator exists. SVMs are effective in high-dimensional spaces and are used for classification, regression (SVR), and outlier detection.

Key Formula

Minimise (1/2)||w||² subject to yᵢ(w·xᵢ + b) ≥ 1 for all i

LaTeX: \min_{\mathbf{w},b} \frac{1}{2}\|\mathbf{w}\|^2 \quad \text{s.t. } y_i(\mathbf{w}\cdot\mathbf{x}_i + b) \geq 1

SymbolMeaningUnit
\mathbf{w}Weight vector (normal to hyperplane)dimensionless
bBias term (intercept)dimensionless
\|\mathbf{w}\|Euclidean norm of w; margin = 2/||w||dimensionless
y_iClass label of sample i (+1 or −1)dimensionless
\mathbf{x}_iFeature vector of sample idimensionless

SVM Kernel Functions and Their Applications

KernelFormulaBest ForHyperparameters
LinearK(x,z) = x·zLinearly separable, text classificationC
PolynomialK(x,z) = (x·z + c)^dImage recognition, low-dim dataC, d, c
RBF / GaussianK(x,z) = exp(-γ||x-z||²)General-purpose, non-linearC, γ
SigmoidK(x,z) = tanh(αx·z + c)Neural network-like behaviourC, α, c

Interactive Tools

Scikit-learn SVM Documentation

Open Tool

Brilliant.org SVM

Open Tool

Desmos SVM Visualiser

Open Tool
Support vector machine diagram showing maximum-margin hyperplane and support vectors

Wikimedia Commons, CC BY-SA

Related Terms

Computer Science

Feature Engineering

Feature engineering is the process of using domain knowledge to select, transform, or create input variables (features) from raw data to improve the performance of machine learning models. It bridges raw data and predictive algorithms by producing representations that algorithms can learn from more effectively. Techniques include normalization, one-hot encoding, polynomial feature creation, and dimensionality reduction.

Computer Science

Regularization (ML)

Regularization in machine learning refers to techniques that add a penalty term to the loss function to discourage model complexity, thereby reducing overfitting and improving generalisation to unseen data. The two most common forms are L1 (Lasso) regularization, which promotes sparsity by penalising the absolute values of weights, and L2 (Ridge) regularization, which penalises the squared values, shrinking all weights toward zero. Regularization is a fundamental concept in statistical learning theory, closely tied to the bias–variance trade-off.

Computer Science

Decision Tree (ML)

A decision tree is a supervised machine learning model that splits data into branches based on feature values, forming a tree structure where each internal node represents a feature test, each branch represents an outcome, and each leaf node holds a prediction. Trees are trained by choosing splits that maximise information gain or minimise Gini impurity at each step. They are highly interpretable and serve as the building block for ensemble methods like random forests and gradient boosting.

The concept was developed by Vladimir Vapnik and Alexey Chervonenkis in 1963, with the modern soft-margin version introduced by Vapnik and Cortes in 1995. "Support vectors" are the critical training points that "support" (define) the margin boundary.

svmclassificationkernel-methodssupervised-learningmargin-maximisation