MathematicsStatisticsAdvanced

Regression Analysis

Also known as:Linear regressionLeast squares regressionOLS regression

Regression analysis is a set of statistical methods for estimating the relationship between a dependent variable (response) and one or more independent variables (predictors), typically by fitting a mathematical model that minimises prediction error. Simple linear regression fits a straight line through bivariate data using the method of least squares; multiple regression extends this to several predictors. It is used extensively in economics, biology, engineering, and machine learning for prediction, forecasting, and causal inference.

Key Formula

ŷ = β₀ + β₁x + ε

LaTeX: \hat{y} = \beta_0 + \beta_1 x + \varepsilon

Symbol	Meaning	Unit
\hat{y}	Predicted value of the dependent variable	same as y
\beta_0	Intercept (value of ŷ when x = 0)	same as y
\beta_1	Slope (change in ŷ per unit increase in x)	y-units per x-unit
x	Independent (predictor) variable	units of x
\varepsilon	Error (residual) term	same as y

Worked Example

Problem

Data: x (hours studied) = [2, 4, 6, 8], y (exam score) = [50, 65, 75, 85]. Fit a simple linear regression and predict score for 7 hours.

Solution

Step 1: n=4, Σx=20, Σy=275, Σx²=120, Σxy=1 470. Step 2: β₁ = (nΣxy − ΣxΣy) / (nΣx² − (Σx)²) = (4×1470 − 20×275) / (4×120 − 400) = (5880−5500)/(480−400) = 380/80 = 4.75. Step 3: β₀ = (Σy − β₁Σx)/n = (275 − 4.75×20)/4 = (275−95)/4 = 45.00. Step 4: ŷ(7) = 45 + 4.75×7 = 45 + 33.25 = 78.25.

Answer

Regression line: ŷ = 45 + 4.75x; predicted score for 7 hours = 78.25

Types of Regression Analysis and Their Applications

Type	Response Variable	Predictors	Typical Application
Simple linear	Continuous	1 continuous	Height vs weight
Multiple linear	Continuous	≥2 mixed	House price prediction
Logistic	Binary (0/1)	Mixed	Disease classification
Polynomial	Continuous	1 with powers	Curvilinear growth
Ridge / Lasso	Continuous	Many (regularised)	High-dimensional data
Poisson	Count data	Mixed	Event rate modelling

Interactive Tools

Desmos Regression

Perform and visualise linear and polynomial regression on custom data sets

Open Tool

Khan Academy — Regression

Lessons on scatter plots, correlation, and linear regression

Open Tool

Wolfram Alpha

Compute regression coefficients, R², and residuals automatically

Open Tool

Wikimedia Commons, CC BY-SA

Related Terms

Mathematics

Correlation Coefficient

The Pearson correlation coefficient (r) is a dimensionless statistic that measures the strength and direction of the linear relationship between two continuous variables, ranging from −1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear association. It is calculated as the covariance of the two variables divided by the product of their standard deviations. While correlation quantifies association, it does not imply causation — a fundamental principle in statistical reasoning.

Mathematics

Central Limit Theorem

The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean approaches a normal distribution as the sample size n increases, regardless of the shape of the underlying population distribution, provided the population has a finite mean and variance. For most practical purposes, normality is achieved when n ≥ 30. The CLT is the theoretical foundation for Z-tests, t-tests, confidence intervals, and virtually all classical inferential statistics.

Mathematics

Hypothesis Testing

Hypothesis testing is a formal statistical procedure for making decisions about a population parameter based on sample data, by evaluating evidence against a null hypothesis (H₀) in favour of an alternative hypothesis (H₁). A test statistic is computed and compared to a critical value or converted to a p-value; if the result is statistically significant (p < α), the null hypothesis is rejected. It underpins scientific research, clinical trials, quality assurance, and data-driven decision-making across all quantitative disciplines.

Francis Galton coined the term "regression" in 1886 from the Latin "regressio" (a return), observing that extreme parental heights tended to "regress" toward the mean in offspring. Karl Pearson and Udny Yule later formalised linear regression as a statistical method.

statisticsmodellingpredictionleast-squaresmachine-learning