MathematicsStatisticsMedium

Correlation Coefficient

Also known as:Pearson's rPearson correlationProduct-moment correlation coefficient

The Pearson correlation coefficient (r) is a dimensionless statistic that measures the strength and direction of the linear relationship between two continuous variables, ranging from −1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear association. It is calculated as the covariance of the two variables divided by the product of their standard deviations. While correlation quantifies association, it does not imply causation — a fundamental principle in statistical reasoning.

Key Formula

r = Σ[(xᵢ−x̄)(yᵢ−ȳ)] / √[Σ(xᵢ−x̄)² × Σ(yᵢ−ȳ)²]

LaTeX: r = \dfrac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i-\bar{x})^2 \sum(y_i-\bar{y})^2}}

Symbol	Meaning	Unit
r	Pearson correlation coefficient	dimensionless
x_i, y_i	Individual data point values	same as data
\bar{x}, \bar{y}	Sample means of x and y	same as data
n	Number of data pairs	count

Worked Example

Problem

Two variables: x = [1, 2, 3, 4, 5], y = [2, 4, 5, 4, 5]. Calculate the Pearson correlation coefficient.

Solution

Step 1: x̄ = 3, ȳ = 4. Step 2: (xᵢ−x̄): −2, −1, 0, 1, 2; (yᵢ−ȳ): −2, 0, 1, 0, 1. Step 3: Products: 4, 0, 0, 0, 2 → Σ = 6. Step 4: Σ(xᵢ−x̄)² = 4+1+0+1+4 = 10; Σ(yᵢ−ȳ)² = 4+0+1+0+1 = 6. Step 5: r = 6 / √(10 × 6) = 6 / √60 = 6/7.746 ≈ 0.775.

Answer

r ≈ 0.775 — moderate-to-strong positive linear correlation

Interpretation of Pearson Correlation Coefficient Values

r Value Range	Strength	Direction	Example
−1.00	Perfect	Negative	Exact inverse relationship
−0.70 to −0.99	Strong	Negative	Study time vs errors
−0.30 to −0.69	Moderate	Negative	Stress vs sleep
−0.29 to 0.29	Weak/None	—	Shoe size vs IQ
0.30 to 0.69	Moderate	Positive	Height vs weight
0.70 to 1.00	Strong	Positive	Temperature vs ice cream sales

Interactive Tools

Desmos — Correlation

Plot data, compute correlation, and visualise scatter plots interactively

Open Tool

Khan Academy — Correlation

Lessons on interpreting correlation and avoiding causation fallacies

Open Tool

Wolfram Alpha

Compute Pearson and Spearman correlations from data sets

Open Tool

Wikimedia Commons, CC BY-SA

Related Terms

Mathematics

Regression Analysis

Regression analysis is a set of statistical methods for estimating the relationship between a dependent variable (response) and one or more independent variables (predictors), typically by fitting a mathematical model that minimises prediction error. Simple linear regression fits a straight line through bivariate data using the method of least squares; multiple regression extends this to several predictors. It is used extensively in economics, biology, engineering, and machine learning for prediction, forecasting, and causal inference.

Mathematics

Z-score

A Z-score (also called a standard score) measures how many standard deviations a data point is from the mean of its distribution. It standardises values from different distributions, enabling direct comparison by placing them on a common scale. Z-scores are widely used in quality control, hypothesis testing, and the construction of standard normal tables.

Mathematics

Central Limit Theorem

The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean approaches a normal distribution as the sample size n increases, regardless of the shape of the underlying population distribution, provided the population has a finite mean and variance. For most practical purposes, normality is achieved when n ≥ 30. The CLT is the theoretical foundation for Z-tests, t-tests, confidence intervals, and virtually all classical inferential statistics.

The term "correlation" comes from the Latin "correlatio" (mutual relation), popularised by Francis Galton in 1888. Karl Pearson formalised the product-moment correlation coefficient formula in 1895, hence the eponym "Pearson's r".

statisticsbivariate-analysisassociationlinear-relationshipprobability