The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean approaches a normal distribution as the sample size n increases, regardless of the shape of the underlying population distribution, provided the population has a finite mean and variance. For most practical purposes, normality is achieved when n ≥ 30. The CLT is the theoretical foundation for Z-tests, t-tests, confidence intervals, and virtually all classical inferential statistics.
X̄ ~ N(μ, σ²/n) as n → ∞
LaTeX: \bar{X} \sim N\!\left(\mu,\, \frac{\sigma^2}{n}\right) \text{ as } n \to \infty
| Symbol | Meaning | Unit |
|---|---|---|
| \bar{X} | Sample mean (random variable) | same as population |
| \mu | Population mean | same as population |
| \sigma^2 | Population variance | squared units |
| n | Sample size | count |
Problem
A population is uniformly distributed on [0, 10] with μ = 5 and σ² = 100/12 ≈ 8.33. If samples of n = 36 are repeatedly drawn, what is the distribution of the sample mean?
Solution
Step 1: Population: Uniform[0,10], μ = 5, σ² = 8.33, σ = 2.887. Step 2: By CLT, X̄ ~ N(μ, σ²/n) = N(5, 8.33/36) = N(5, 0.231). Step 3: Standard error = √0.231 ≈ 0.481. Step 4: P(4.5 < X̄ < 5.5) = P(−1.04 < Z < 1.04) ≈ 0.702.
Answer
X̄ ~ N(5, 0.231); SE = 0.481; approximately 70.2% of sample means fall within [4.5, 5.5]
| Population Shape | n = 5 | n = 10 | n = 30 | n = 50 |
|---|---|---|---|---|
| Normal | Normal | Normal | Normal | Normal |
| Slightly skewed | Skewed | Near normal | Normal | Normal |
| Moderately skewed | Skewed | Slightly skewed | Approximately normal | Normal |
| Heavily skewed | Heavily skewed | Skewed | Near normal | Approximately normal |
| Bimodal | Bimodal | Irregular | Near normal | Normal |
| Uniform | Flat | Triangular | Normal | Normal |
Wikimedia Commons, CC BY-SA
A confidence interval (CI) is a range of plausible values for an unknown population parameter, constructed from sample data so that the procedure captures the true parameter with a specified probability (the confidence level, e.g., 95%). Crucially, the confidence level refers to the long-run success rate of the procedure — not the probability that a particular interval contains the parameter. Confidence intervals are used throughout science, medicine, and engineering to quantify estimation uncertainty.
A Z-score (also called a standard score) measures how many standard deviations a data point is from the mean of its distribution. It standardises values from different distributions, enabling direct comparison by placing them on a common scale. Z-scores are widely used in quality control, hypothesis testing, and the construction of standard normal tables.
The Law of Large Numbers (LLN) states that as the number of independent, identically distributed trials of a random experiment increases, the sample mean converges to the true population mean (expected value). There are two forms: the Weak LLN (convergence in probability, proved by Jacob Bernoulli) and the Strong LLN (almost sure convergence, proved by Émile Borel). The LLN is the mathematical justification for empirical estimation of probabilities and the stability of statistical averages in the long run.
The theorem was first proven for binomial distributions by Abraham de Moivre in 1733. Pierre-Simon Laplace extended it in 1812. The term "Central Limit Theorem" (from German "Zentraler Grenzwertsatz") was coined by Georg Pólya in 1920, with "central" reflecting its pivotal role in probability theory.