A Z-score (also called a standard score) measures how many standard deviations a data point is from the mean of its distribution. It standardises values from different distributions, enabling direct comparison by placing them on a common scale. Z-scores are widely used in quality control, hypothesis testing, and the construction of standard normal tables.
z = (x − μ) / σ
LaTeX: z = \dfrac{x - \mu}{\sigma}
| Symbol | Meaning | Unit |
|---|---|---|
| z | Standard score (Z-score) | dimensionless |
| x | Observed data value | same as data |
| μ | Population mean | same as data |
| σ | Population standard deviation | same as data |
Problem
A student scores 78 on an exam where the class mean is 70 and the standard deviation is 8. What is the Z-score?
Solution
Step 1: Identify values — x = 78, μ = 70, σ = 8. Step 2: Apply the formula: z = (78 − 70) / 8. Step 3: z = 8 / 8 = 1.00.
Answer
Z-score = 1.00 (the student scored exactly 1 standard deviation above the mean)
| Z-score | Percentile | Interpretation | Area below |
|---|---|---|---|
| -2.00 | 2.28% | Far below average | 0.0228 |
| -1.00 | 15.87% | Below average | 0.1587 |
| 0.00 | 50.00% | Average | 0.5000 |
| 1.00 | 84.13% | Above average | 0.8413 |
| 2.00 | 97.72% | Well above average | 0.9772 |
| 3.00 | 99.87% | Exceptional | 0.9987 |
Wikimedia Commons, CC BY-SA
Standard deviation is the square root of the variance and measures the average distance of data points from the mean in the original units of measurement. It is the most widely used measure of statistical dispersion because, unlike variance, it is expressed in the same units as the data. A small standard deviation indicates data clustered near the mean; a large one indicates wide spread.
The normal distribution is a continuous, symmetric, bell-shaped probability distribution characterised by its mean (μ) and standard deviation (σ). It is the most important distribution in statistics because many natural phenomena — heights, measurement errors, test scores — follow or approximate it. The Central Limit Theorem guarantees that the mean of a large sample from any distribution is approximately normally distributed.
A confidence interval (CI) is a range of plausible values for an unknown population parameter, constructed from sample data so that the procedure captures the true parameter with a specified probability (the confidence level, e.g., 95%). Crucially, the confidence level refers to the long-run success rate of the procedure — not the probability that a particular interval contains the parameter. Confidence intervals are used throughout science, medicine, and engineering to quantify estimation uncertainty.
The letter "Z" derives from the German word "Zahl" (number). The standardisation concept was formalised by Karl Pearson and Francis Galton in the late 19th century as part of the development of modern statistics.