The mean (arithmetic mean) is the sum of all values in a dataset divided by the number of values, and represents the central or typical value. It is the most commonly used measure of central tendency and is sensitive to extreme values (outliers). The mean is used extensively in data analysis, quality control, and as the foundation for more advanced statistical measures such as variance and standard deviation.
x̄ = (x₁ + x₂ + ... + xₙ) / n
LaTeX: \bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i
| Symbol | Meaning | Unit |
|---|---|---|
| \bar{x} | sample mean | same as data |
| xᵢ | each data value | same as data |
| n | total number of data values | unitless |
Problem
Seven employees have annual salaries (in ₹ lakh): 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 25.0. Find the arithmetic mean salary.
Solution
Step 1: Sum all salaries: 3.5 + 4.0 + 4.5 + 5.0 + 5.5 + 6.0 + 25.0 = 53.5 lakh. Step 2: Divide by n = 7: x̄ = 53.5 / 7 ≈ 7.64 lakh. Note: The mean is pulled up by the outlier salary of ₹25 lakh; the median (₹5 lakh) would better represent the typical employee.
Answer
Mean salary ≈ ₹7.64 lakh
| Measure | Definition | Affected by Outliers | Best Used When |
|---|---|---|---|
| Mean | Sum / count | Yes (strongly) | Data is symmetric, no outliers |
| Median | Middle value | No | Skewed data or outliers present |
| Mode | Most frequent value | No | Categorical or multimodal data |
| Geometric Mean | nth root of product | Less sensitive | Ratios or exponential growth |
| Harmonic Mean | n / Σ(1/xᵢ) | Less sensitive | Rates and speeds |
Wikimedia Commons, CC BY-SA
The median is the middle value of a dataset when values are arranged in ascending or descending order; it divides the distribution into two equal halves. For an even number of values, it is the average of the two central values. The median is a robust measure of central tendency, unaffected by extreme outliers, making it preferable to the mean for skewed distributions such as income or house prices.
The mode is the value that appears most frequently in a dataset, making it the only measure of central tendency applicable to nominal (categorical) data. A dataset can be unimodal (one mode), bimodal (two modes), multimodal (many modes), or have no mode if all values are equally frequent. The mode is particularly useful in market research, fashion, and any context where the most common category or value is of interest.
Standard deviation is the square root of the variance and measures the average distance of data points from the mean in the original units of measurement. It is the most widely used measure of statistical dispersion because, unlike variance, it is expressed in the same units as the data. A small standard deviation indicates data clustered near the mean; a large one indicates wide spread.
From Old French meien and Latin medianus (middle). The arithmetic mean as a concept predates recorded history, but its formal use in statistics was established by mathematicians such as Gauss and Laplace in the 18th–19th centuries.