Computer ScienceAI & Machine LearningMedium

Random Forest

Also known as:Random Decision ForestRandom Trees

A random forest is an ensemble machine learning algorithm that constructs a large number of decision trees during training and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Each tree is trained on a bootstrap sample of the data and uses a random subset of features at each split, introducing diversity that reduces variance and overfitting. Introduced by Leo Breiman in 2001, random forests are among the most widely used and robust general-purpose algorithms.

Key Formula

Prediction = (1/B) × sum of B individual tree predictions T_b(x)

LaTeX: \hat{y} = \frac{1}{B} \sum_{b=1}^{B} T_b(\mathbf{x})

Symbol	Meaning	Unit
\hat{y}	Ensemble prediction (mean over trees)	target units
B	Number of trees in the forest	count
T_b(\mathbf{x})	Prediction of the b-th tree for input x	target units
\mathbf{x}	Input feature vector	dimensionless

Random Forest vs. Single Decision Tree — Key Comparisons

Property	Single Decision Tree	Random Forest	Benefit
Variance	High	Low	Ensemble averaging
Bias	Low	Slightly higher	Acceptable trade-off
Overfitting risk	High	Low	Bootstrap + feature sampling
Interpretability	High	Low	Ensemble is a black box
Training speed	Fast	Slower	Multiple trees needed
Feature importance	Available	Averaged, more stable	More reliable ranking

Interactive Tools

Scikit-learn Random Forest

Open Tool

Kaggle Random Forest Tutorial

Open Tool

Brilliant.org Ensemble Methods

Open Tool

Wikimedia Commons, CC BY-SA

Related Terms

Computer Science

Decision Tree (ML)

A decision tree is a supervised machine learning model that splits data into branches based on feature values, forming a tree structure where each internal node represents a feature test, each branch represents an outcome, and each leaf node holds a prediction. Trees are trained by choosing splits that maximise information gain or minimise Gini impurity at each step. They are highly interpretable and serve as the building block for ensemble methods like random forests and gradient boosting.

Computer Science

Feature Engineering

Feature engineering is the process of using domain knowledge to select, transform, or create input variables (features) from raw data to improve the performance of machine learning models. It bridges raw data and predictive algorithms by producing representations that algorithms can learn from more effectively. Techniques include normalization, one-hot encoding, polynomial feature creation, and dimensionality reduction.

Computer Science

Regularization (ML)

Regularization in machine learning refers to techniques that add a penalty term to the loss function to discourage model complexity, thereby reducing overfitting and improving generalisation to unseen data. The two most common forms are L1 (Lasso) regularization, which promotes sparsity by penalising the absolute values of weights, and L2 (Ridge) regularization, which penalises the squared values, shrinking all weights toward zero. Regularization is a fundamental concept in statistical learning theory, closely tied to the bias–variance trade-off.

The term "random forest" was coined by Leo Breiman and Adele Cutler in 2001. "Random" refers to the random feature subsampling at each split and the random bootstrap sampling of training data; "forest" is a metaphor for a collection of trees.

ensemble-learningrandom-forestclassificationregressionsupervised-learning