Computer ScienceAI & Machine LearningEasy

Supervised Learning

Also known as:Labeled Learning

Supervised learning is a machine learning approach where a model is trained on a labeled dataset, meaning each training example is paired with the correct output (label). The model learns a mapping from inputs to outputs by minimizing the difference between its predictions and the true labels. It is the most widely used ML paradigm and underpins applications such as image recognition, speech transcription, and credit scoring.

Key Formula

Mean Squared Error Loss = (1/n) * sum of (actual - predicted)^2

LaTeX: L = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2

Symbol	Meaning	Unit
L	Mean Squared Error loss	dimensionless
n	Number of training examples	count
y_i	True label for example i	depends on problem
\hat{y}_i	Predicted output for example i	depends on problem

Worked Example

Problem

A model predicts house prices (in lakhs INR) for 4 houses. True prices: [50, 80, 65, 90]. Predicted prices: [48, 85, 60, 92]. Calculate the Mean Squared Error.

Solution

Step 1 — Compute squared differences: (50−48)² = 4 (80−85)² = 25 (65−60)² = 25 (90−92)² = 4 Step 2 — Sum the squared differences: 4 + 25 + 25 + 4 = 58 Step 3 — Divide by n = 4: MSE = 58 / 4 = 14.5

Answer

MSE = 14.5 (lakhs INR)²

Common Supervised Learning Algorithms

Algorithm	Task Type	Strengths	Limitations
Linear Regression	Regression	Interpretable, fast	Assumes linearity
Logistic Regression	Classification	Probabilistic output	Linear boundary only
Decision Tree	Both	Easy to visualize	Prone to overfitting
Random Forest	Both	High accuracy	Computationally heavy
Support Vector Machine	Classification	Effective in high dimensions	Slow on large datasets

Interactive Tools

Scikit-learn Supervised Learning Guide

Comprehensive guide to supervised learning algorithms in Python

Open Tool

Khan Academy — Statistics

Foundational statistics needed to understand supervised learning

Open Tool

Brilliant.org — Supervised Learning

Interactive course covering regression and classification methods

Open Tool

Wikimedia Commons, CC BY-SA

Related Terms

Computer Science

Machine Learning

Machine learning is a branch of artificial intelligence in which systems learn from data to improve their performance on tasks without being explicitly programmed for each task. It works by identifying statistical patterns in training data and using those patterns to make predictions or decisions on new, unseen data. Machine learning powers applications ranging from spam filters and recommendation engines to medical diagnosis and autonomous vehicles.

Computer Science

Overfitting

Overfitting occurs when a machine learning model learns the training data too well — including its noise and random fluctuations — to the point where it performs poorly on new, unseen data. An overfitted model has high training accuracy but low validation/test accuracy, indicating it has memorized patterns specific to the training set rather than generalizing. Overfitting is more likely with complex models, small datasets, or insufficient regularization.

Computer Science

Cross-Validation

Cross-validation is a statistical technique for evaluating a machine learning model's ability to generalize to an independent dataset. The most common form, k-fold cross-validation, partitions the training data into k equally sized subsets; the model is trained on k−1 folds and evaluated on the remaining fold, repeating this process k times and averaging the results. Cross-validation provides a more reliable performance estimate than a single train-test split and helps in selecting hyperparameters and comparing models.

The term "supervised" derives from Latin "supervidere" (to oversee). In the ML context it was popularized in the 1980s–1990s to distinguish training with labeled examples (a "supervisor" provides correct answers) from unsupervised approaches.

supervised-learningclassificationregressionlabelstraining-data