Hypothesis testing is a formal statistical procedure for making decisions about a population parameter based on sample data, by evaluating evidence against a null hypothesis (H₀) in favour of an alternative hypothesis (H₁). A test statistic is computed and compared to a critical value or converted to a p-value; if the result is statistically significant (p < α), the null hypothesis is rejected. It underpins scientific research, clinical trials, quality assurance, and data-driven decision-making across all quantitative disciplines.
Problem
A manufacturer claims its bottles contain exactly 500 mL. A quality-control inspector measures a sample of 36 bottles and finds x̄ = 497 mL with σ = 9 mL. Test at α = 0.05 (two-tailed) whether the mean differs from 500 mL.
Solution
Step 1: H₀: μ = 500; H₁: μ ≠ 500, α = 0.05. Step 2: Z = (497 − 500) / (9 / √36) = −3 / 1.5 = −2.00. Step 3: Critical values: ±Z₀.₀₂₅ = ±1.96. Step 4: |−2.00| = 2.00 > 1.96 → reject H₀.
Answer
Z = −2.00; reject H₀ — evidence that the mean fill differs from 500 mL
| Component | Symbol/Term | Description | Decision Rule |
|---|---|---|---|
| Null hypothesis | H₀ | Default claim, assumed true | Reject if p < α |
| Alternative hypothesis | H₁ or Hₐ | Claim to support | Accept if H₀ rejected |
| Significance level | α | Threshold probability (e.g., 0.05) | Chosen before the test |
| Test statistic | Z, t, χ² | Standardised sample measure | Compared to critical value |
| p-value | p | Probability of result if H₀ true | Reject H₀ if p < α |
| Type I error | α | False positive (reject true H₀) | Controlled by setting α |
Wikimedia Commons, CC BY-SA
The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. A small p-value (typically p < 0.05) indicates that the observed data would be unlikely under H₀, providing evidence to reject it; it does not measure the probability that the null hypothesis is true. Correct interpretation of p-values is essential to avoid common statistical fallacies in research and data analysis.
The t-distribution (Student's t-distribution) is a continuous probability distribution that arises when estimating the mean of a normally distributed population when the sample size is small and the population standard deviation is unknown. It has heavier tails than the normal distribution, reflecting greater uncertainty; as the degrees of freedom increase toward infinity, it converges to the standard normal distribution. It is the foundation of t-tests and is central to small-sample statistical inference.
A confidence interval (CI) is a range of plausible values for an unknown population parameter, constructed from sample data so that the procedure captures the true parameter with a specified probability (the confidence level, e.g., 95%). Crucially, the confidence level refers to the long-run success rate of the procedure — not the probability that a particular interval contains the parameter. Confidence intervals are used throughout science, medicine, and engineering to quantify estimation uncertainty.
The formal framework of hypothesis testing was developed independently by Ronald Fisher (significance testing, 1925) and by Jerzy Neyman and Egon Pearson (decision-theoretic approach, 1933). The term "null hypothesis" was coined by Fisher, from Latin "nullus" (none/nothing), denoting the hypothesis of no effect.