The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. A small p-value (typically p < 0.05) indicates that the observed data would be unlikely under H₀, providing evidence to reject it; it does not measure the probability that the null hypothesis is true. Correct interpretation of p-values is essential to avoid common statistical fallacies in research and data analysis.
p = P(T ≥ t_obs | H₀)
LaTeX: p = P(T \geq t_{\text{obs}} \mid H_0)
| Symbol | Meaning | Unit |
|---|---|---|
| T | Test statistic (random variable) | dimensionless |
| t_{obs} | Observed value of the test statistic | dimensionless |
| H_0 | Null hypothesis | N/A |
Problem
A Z-test yields Z = 2.30 for a one-tailed test (H₁: μ > μ₀). What is the p-value and what conclusion is drawn at α = 0.05?
Solution
Step 1: p = P(Z ≥ 2.30) for the standard normal distribution. Step 2: From Z-tables, P(Z < 2.30) = 0.9893. Step 3: p = 1 − 0.9893 = 0.0107. Step 4: Since p = 0.0107 < α = 0.05, reject H₀.
Answer
p = 0.0107; reject H₀ — statistically significant result at the 5% level
| p-value Range | Evidence Against H₀ | Typical Decision | Common Usage |
|---|---|---|---|
| p > 0.10 | Little to none | Fail to reject H₀ | Social sciences |
| 0.05 < p ≤ 0.10 | Marginal | Borderline, context-dependent | Exploratory research |
| 0.01 < p ≤ 0.05 | Moderate | Reject H₀ | Standard threshold |
| 0.001 < p ≤ 0.01 | Strong | Reject H₀ | Medical research |
| p ≤ 0.001 | Very strong | Reject H₀ | Physics, large studies |
Wikimedia Commons, CC BY-SA
Hypothesis testing is a formal statistical procedure for making decisions about a population parameter based on sample data, by evaluating evidence against a null hypothesis (H₀) in favour of an alternative hypothesis (H₁). A test statistic is computed and compared to a critical value or converted to a p-value; if the result is statistically significant (p < α), the null hypothesis is rejected. It underpins scientific research, clinical trials, quality assurance, and data-driven decision-making across all quantitative disciplines.
A confidence interval (CI) is a range of plausible values for an unknown population parameter, constructed from sample data so that the procedure captures the true parameter with a specified probability (the confidence level, e.g., 95%). Crucially, the confidence level refers to the long-run success rate of the procedure — not the probability that a particular interval contains the parameter. Confidence intervals are used throughout science, medicine, and engineering to quantify estimation uncertainty.
A Z-score (also called a standard score) measures how many standard deviations a data point is from the mean of its distribution. It standardises values from different distributions, enabling direct comparison by placing them on a common scale. Z-scores are widely used in quality control, hypothesis testing, and the construction of standard normal tables.
The term "p-value" (probability value) was introduced by Ronald Fisher in his 1925 book "Statistical Methods for Research Workers". Fisher intended it as an informal measure of evidence, not the formal decision rule it later became through Neyman–Pearson theory.