Computer ScienceAI & Machine LearningMedium

Cross-Validation

Also known as:k-Fold ValidationRotation Estimation

Cross-validation is a statistical technique for evaluating a machine learning model's ability to generalize to an independent dataset. The most common form, k-fold cross-validation, partitions the training data into k equally sized subsets; the model is trained on k−1 folds and evaluated on the remaining fold, repeating this process k times and averaging the results. Cross-validation provides a more reliable performance estimate than a single train-test split and helps in selecting hyperparameters and comparing models.

Key Formula

Cross-Validation Score = (1/k) * sum of Score for each fold i from 1 to k

LaTeX: CV_k = \frac{1}{k} \sum_{i=1}^{k} \text{Score}_i

SymbolMeaningUnit
CV_kMean cross-validation score across all foldsdepends on metric (e.g., accuracy %)
kNumber of foldscount
\text{Score}_iModel performance score on fold i (e.g., accuracy)depends on metric

Worked Example

Problem

A classifier is evaluated using 5-fold cross-validation. The accuracy scores on each fold are: 82%, 85%, 79%, 88%, and 86%. Calculate the mean CV accuracy and standard deviation.

Solution

Step 1 — Sum the scores: 82 + 85 + 79 + 88 + 86 = 420 Step 2 — Mean CV accuracy: 420 / 5 = 84% Step 3 — Compute variance: Deviations from mean: (−2)², 1², (−5)², 4², 2² = 4, 1, 25, 16, 4 Variance = (4 + 1 + 25 + 16 + 4) / 5 = 50 / 5 = 10 Step 4 — Standard deviation: σ = √10 ≈ 3.16%

Answer

Mean CV accuracy = 84% ± 3.16%

Cross-Validation Strategies Compared

Strategyk ValueComputational CostBest Used When
Hold-out (train/test split)N/AVery lowVery large datasets
k-Fold (k=5)5ModerateStandard practice
k-Fold (k=10)10Moderate–highRecommended default
Leave-One-Out (LOOCV)n (all samples)Very highSmall datasets
Stratified k-Fold5–10ModerateImbalanced class distribution

Interactive Tools

Scikit-learn Cross-Validation

Python implementation of k-fold and stratified cross-validation

Open Tool

Khan Academy — Statistics

Statistical foundations needed to understand cross-validation

Open Tool

Brilliant.org — Machine Learning

Interactive problem sets covering model evaluation and validation

Open Tool
Diagram illustrating k-fold cross-validation with training and validation splits across folds

Wikimedia Commons, CC BY-SA

Related Terms

"Cross-validation" combines "cross" (from Old English "cros," meaning to pass over or intersect) and "validation" (from Latin "validus," strong or effective). The technique was formalized in statistics by Seymour Geisser and Allen in the 1970s and later adopted widely in machine learning.

cross-validationmodel-evaluationk-foldgeneralizationhyperparameter-tuning