Variance measures the average squared deviation of a random variable from its mean, quantifying how spread out the values in a distribution are. A low variance indicates values cluster tightly around the mean; a high variance indicates they are widely dispersed. Variance is the square of the standard deviation and is fundamental to ANOVA, regression analysis, and portfolio theory.
σ² = E[(X − μ)²] = E[X²] − (E[X])²
LaTeX: \sigma^2 = E[(X - \mu)^2] = E[X^2] - (E[X])^2
| Symbol | Meaning | Unit |
|---|---|---|
| σ² | population variance | square of the unit of X |
| X | random variable | same as X |
| μ | population mean E[X] | same as X |
| E[·] | expected value operator | unitless |
Problem
Five students scored: 60, 70, 80, 90, 100. Calculate the population variance.
Solution
Step 1: Find the mean: μ = (60 + 70 + 80 + 90 + 100) / 5 = 400 / 5 = 80. Step 2: Compute squared deviations: (60−80)² = 400 (70−80)² = 100 (80−80)² = 0 (90−80)² = 100 (100−80)² = 400 Step 3: Sum: 400 + 100 + 0 + 100 + 400 = 1000. Step 4: Divide by N = 5: σ² = 1000 / 5 = 200.
Answer
Population variance σ² = 200 (marks squared)
| Property | Population Variance | Sample Variance |
|---|---|---|
| Symbol | σ² | s² |
| Formula | Σ(xᵢ − μ)² / N | Σ(xᵢ − x̄)² / (n−1) |
| Denominator | N (full population) | n−1 (Bessel's correction) |
| Purpose | Describes entire population | Estimates population from sample |
| Bias | Unbiased for population | Unbiased estimator of σ² |
Wikimedia Commons, CC BY-SA
Standard deviation is the square root of the variance and measures the average distance of data points from the mean in the original units of measurement. It is the most widely used measure of statistical dispersion because, unlike variance, it is expressed in the same units as the data. A small standard deviation indicates data clustered near the mean; a large one indicates wide spread.
The expected value (or expectation) of a random variable is the probability-weighted average of all possible values it can take. It represents the long-run average outcome if the experiment were repeated many times. Expected value is central to decision theory, gambling, insurance, and financial risk analysis.
The mean (arithmetic mean) is the sum of all values in a dataset divided by the number of values, and represents the central or typical value. It is the most commonly used measure of central tendency and is sensitive to extreme values (outliers). The mean is used extensively in data analysis, quality control, and as the foundation for more advanced statistical measures such as variance and standard deviation.
From Latin variare (to change, alter). The term "variance" in its statistical sense was introduced by Ronald A. Fisher in his 1918 paper "The Correlation Between Relatives on the Supposition of Mendelian Inheritance."