MathematicsStatisticsAdvanced

Regression Analysis

Also known as:Linear regressionLeast squares regressionOLS regression

Regression analysis is a set of statistical methods for estimating the relationship between a dependent variable (response) and one or more independent variables (predictors), typically by fitting a mathematical model that minimises prediction error. Simple linear regression fits a straight line through bivariate data using the method of least squares; multiple regression extends this to several predictors. It is used extensively in economics, biology, engineering, and machine learning for prediction, forecasting, and causal inference.

Key Formula

ŷ = β₀ + β₁x + ε

LaTeX: \hat{y} = \beta_0 + \beta_1 x + \varepsilon

SymbolMeaningUnit
\hat{y}Predicted value of the dependent variablesame as y
\beta_0Intercept (value of ŷ when x = 0)same as y
\beta_1Slope (change in ŷ per unit increase in x)y-units per x-unit
xIndependent (predictor) variableunits of x
\varepsilonError (residual) termsame as y

Worked Example

Problem

Data: x (hours studied) = [2, 4, 6, 8], y (exam score) = [50, 65, 75, 85]. Fit a simple linear regression and predict score for 7 hours.

Solution

Step 1: n=4, Σx=20, Σy=275, Σx²=120, Σxy=1 470. Step 2: β₁ = (nΣxy − ΣxΣy) / (nΣx² − (Σx)²) = (4×1470 − 20×275) / (4×120 − 400) = (5880−5500)/(480−400) = 380/80 = 4.75. Step 3: β₀ = (Σy − β₁Σx)/n = (275 − 4.75×20)/4 = (275−95)/4 = 45.00. Step 4: ŷ(7) = 45 + 4.75×7 = 45 + 33.25 = 78.25.

Answer

Regression line: ŷ = 45 + 4.75x; predicted score for 7 hours = 78.25

Types of Regression Analysis and Their Applications

TypeResponse VariablePredictorsTypical Application
Simple linearContinuous1 continuousHeight vs weight
Multiple linearContinuous≥2 mixedHouse price prediction
LogisticBinary (0/1)MixedDisease classification
PolynomialContinuous1 with powersCurvilinear growth
Ridge / LassoContinuousMany (regularised)High-dimensional data
PoissonCount dataMixedEvent rate modelling

Interactive Tools

Desmos Regression

Perform and visualise linear and polynomial regression on custom data sets

Open Tool

Khan Academy — Regression

Lessons on scatter plots, correlation, and linear regression

Open Tool

Wolfram Alpha

Compute regression coefficients, R², and residuals automatically

Open Tool
Scatter plot with fitted linear regression line and residuals illustrated

Wikimedia Commons, CC BY-SA

Related Terms

Mathematics

Correlation Coefficient

The Pearson correlation coefficient (r) is a dimensionless statistic that measures the strength and direction of the linear relationship between two continuous variables, ranging from −1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear association. It is calculated as the covariance of the two variables divided by the product of their standard deviations. While correlation quantifies association, it does not imply causation — a fundamental principle in statistical reasoning.

Mathematics

Central Limit Theorem

The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean approaches a normal distribution as the sample size n increases, regardless of the shape of the underlying population distribution, provided the population has a finite mean and variance. For most practical purposes, normality is achieved when n ≥ 30. The CLT is the theoretical foundation for Z-tests, t-tests, confidence intervals, and virtually all classical inferential statistics.

Mathematics

Hypothesis Testing

Hypothesis testing is a formal statistical procedure for making decisions about a population parameter based on sample data, by evaluating evidence against a null hypothesis (H₀) in favour of an alternative hypothesis (H₁). A test statistic is computed and compared to a critical value or converted to a p-value; if the result is statistically significant (p < α), the null hypothesis is rejected. It underpins scientific research, clinical trials, quality assurance, and data-driven decision-making across all quantitative disciplines.

Francis Galton coined the term "regression" in 1886 from the Latin "regressio" (a return), observing that extreme parental heights tended to "regress" toward the mean in offspring. Karl Pearson and Udny Yule later formalised linear regression as a statistical method.

statisticsmodellingpredictionleast-squaresmachine-learning