The normal distribution is a continuous, symmetric, bell-shaped probability distribution characterised by its mean (μ) and standard deviation (σ). It is the most important distribution in statistics because many natural phenomena — heights, measurement errors, test scores — follow or approximate it. The Central Limit Theorem guarantees that the mean of a large sample from any distribution is approximately normally distributed.
f(x) = (1 / (σ√(2π))) × e^(−(x−μ)² / (2σ²))
LaTeX: f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
| Symbol | Meaning | Unit |
|---|---|---|
| x | value of the random variable | unitless |
| μ | mean (centre of the distribution) | same as x |
| σ | standard deviation | same as x |
| e | Euler's number ≈ 2.71828 | unitless |
| π | pi ≈ 3.14159 | unitless |
Problem
The heights of adult men in India are normally distributed with μ = 165 cm and σ = 7 cm. What percentage of men are taller than 172 cm?
Solution
Step 1: Standardise: z = (x − μ) / σ = (172 − 165) / 7 = 1.0. Step 2: Find P(Z > 1.0) using the standard normal table. Step 3: P(Z ≤ 1.0) = 0.8413. Step 4: P(Z > 1.0) = 1 − 0.8413 = 0.1587.
Answer
Approximately 15.87% of men are taller than 172 cm.
| Range | Interval | % of Data | Practical Meaning |
|---|---|---|---|
| 1σ | μ ± σ | 68.27% | Majority of typical outcomes |
| 2σ | μ ± 2σ | 95.45% | Almost all normal outcomes |
| 3σ | μ ± 3σ | 99.73% | Near-total coverage |
| Beyond 3σ | Tails | 0.27% | Rare / extreme events |
Wikimedia Commons, CC BY-SA
A probability distribution is a mathematical function that describes the likelihood of each possible outcome of a random variable. It assigns a probability to every possible value or range of values that the variable can take, with all probabilities summing to 1. Probability distributions are foundational in statistics and are used in fields ranging from insurance and finance to physics and machine learning.
Standard deviation is the square root of the variance and measures the average distance of data points from the mean in the original units of measurement. It is the most widely used measure of statistical dispersion because, unlike variance, it is expressed in the same units as the data. A small standard deviation indicates data clustered near the mean; a large one indicates wide spread.
The mean (arithmetic mean) is the sum of all values in a dataset divided by the number of values, and represents the central or typical value. It is the most commonly used measure of central tendency and is sensitive to extreme values (outliers). The mean is used extensively in data analysis, quality control, and as the foundation for more advanced statistical measures such as variance and standard deviation.
Called "normal" because it was considered the standard or typical distribution of errors in measurements. The term was popularised by Karl Pearson in 1894; earlier studied by Gauss (Gaussian distribution) and de Moivre (1733).