Standard deviation is the square root of the variance and measures the average distance of data points from the mean in the original units of measurement. It is the most widely used measure of statistical dispersion because, unlike variance, it is expressed in the same units as the data. A small standard deviation indicates data clustered near the mean; a large one indicates wide spread.
σ = √[ Σ(xᵢ − μ)² / N ]
LaTeX: \sigma = \sqrt{\frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}}
| Symbol | Meaning | Unit |
|---|---|---|
| σ | population standard deviation | same as xᵢ |
| xᵢ | each individual data value | same as data |
| μ | population mean | same as data |
| N | total number of values in the population | unitless |
Problem
A machine fills bottles with a target of 500 mL. Five measurements are: 498, 501, 499, 502, 500 mL. Find the standard deviation.
Solution
Step 1: Mean μ = (498 + 501 + 499 + 502 + 500) / 5 = 2500 / 5 = 500 mL. Step 2: Squared deviations: (498−500)² = 4 (501−500)² = 1 (499−500)² = 1 (502−500)² = 4 (500−500)² = 0 Step 3: Sum = 4+1+1+4+0 = 10. Step 4: Variance = 10/5 = 2. Step 5: σ = √2 ≈ 1.414 mL.
Answer
Standard deviation σ ≈ 1.41 mL
| Interval | Proportion of Data | Example (μ=500, σ=10) | Interpretation |
|---|---|---|---|
| μ ± 1σ | 68.27% | 490 to 510 | Typical range |
| μ ± 2σ | 95.45% | 480 to 520 | Wide normal range |
| μ ± 3σ | 99.73% | 470 to 530 | Almost all data |
| Beyond ±3σ | 0.27% | < 470 or > 530 | Outliers / rare events |
Wikimedia Commons, CC BY-SA
Variance measures the average squared deviation of a random variable from its mean, quantifying how spread out the values in a distribution are. A low variance indicates values cluster tightly around the mean; a high variance indicates they are widely dispersed. Variance is the square of the standard deviation and is fundamental to ANOVA, regression analysis, and portfolio theory.
The mean (arithmetic mean) is the sum of all values in a dataset divided by the number of values, and represents the central or typical value. It is the most commonly used measure of central tendency and is sensitive to extreme values (outliers). The mean is used extensively in data analysis, quality control, and as the foundation for more advanced statistical measures such as variance and standard deviation.
The normal distribution is a continuous, symmetric, bell-shaped probability distribution characterised by its mean (μ) and standard deviation (σ). It is the most important distribution in statistics because many natural phenomena — heights, measurement errors, test scores — follow or approximate it. The Central Limit Theorem guarantees that the mean of a large sample from any distribution is approximately normally distributed.
Introduced by Karl Pearson in 1894 in his paper "On the dissection of asymmetrical frequency curves." The word "standard" conveys the notion of a canonical or reference measure of spread; "deviation" from Latin deviare (to turn aside from the way).