The chi-squared (χ²) distribution is a continuous probability distribution obtained as the sum of squares of k independent standard normal random variables, where k is the degrees of freedom. It is always non-negative and right-skewed, becoming more symmetric as k increases. It is fundamental to the chi-squared test for goodness of fit, tests of independence in contingency tables, and confidence intervals for variance.
χ² = Σ [(Oᵢ − Eᵢ)² / Eᵢ]
LaTeX: \chi^2 = \sum_{i=1}^{k} \dfrac{(O_i - E_i)^2}{E_i}
| Symbol | Meaning | Unit |
|---|---|---|
| O_i | Observed frequency in category i | count |
| E_i | Expected frequency in category i | count |
| k | Number of categories | dimensionless |
Problem
A die is rolled 60 times. Expected frequency per face = 10. Observed counts: 1→8, 2→12, 3→9, 4→11, 5→7, 6→13. Is the die fair at α = 0.05?
Solution
Step 1: Compute each (O − E)²/E: Face 1: (8−10)²/10 = 0.40 Face 2: (12−10)²/10 = 0.40 Face 3: (9−10)²/10 = 0.10 Face 4: (11−10)²/10 = 0.10 Face 5: (7−10)²/10 = 0.90 Face 6: (13−10)²/10 = 0.90 Step 2: χ² = 0.40+0.40+0.10+0.10+0.90+0.90 = 2.80. Step 3: df = 6 − 1 = 5; critical χ²₀.₀₅(5) = 11.07. Step 4: 2.80 < 11.07 → fail to reject H₀.
Answer
χ² = 2.80, df = 5; no evidence that the die is unfair at α = 0.05
| Degrees of Freedom | α = 0.10 | α = 0.05 | α = 0.01 |
|---|---|---|---|
| 1 | 2.706 | 3.841 | 6.635 |
| 2 | 4.605 | 5.991 | 9.210 |
| 5 | 9.236 | 11.070 | 15.086 |
| 10 | 15.987 | 18.307 | 23.209 |
| 20 | 28.412 | 31.410 | 37.566 |
| 30 | 40.256 | 43.773 | 50.892 |
Wikimedia Commons, CC BY-SA
Hypothesis testing is a formal statistical procedure for making decisions about a population parameter based on sample data, by evaluating evidence against a null hypothesis (H₀) in favour of an alternative hypothesis (H₁). A test statistic is computed and compared to a critical value or converted to a p-value; if the result is statistically significant (p < α), the null hypothesis is rejected. It underpins scientific research, clinical trials, quality assurance, and data-driven decision-making across all quantitative disciplines.
The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. A small p-value (typically p < 0.05) indicates that the observed data would be unlikely under H₀, providing evidence to reject it; it does not measure the probability that the null hypothesis is true. Correct interpretation of p-values is essential to avoid common statistical fallacies in research and data analysis.
The t-distribution (Student's t-distribution) is a continuous probability distribution that arises when estimating the mean of a normally distributed population when the sample size is small and the population standard deviation is unknown. It has heavier tails than the normal distribution, reflecting greater uncertainty; as the degrees of freedom increase toward infinity, it converges to the standard normal distribution. It is the foundation of t-tests and is central to small-sample statistical inference.
From the Greek letter χ (chi). The distribution was derived by Ernst Abbe in 1863 and independently by Karl Pearson in 1900, who introduced it in the context of his goodness-of-fit test. The symbol χ² reflects that it arises from squared standard normal variables.