Computer ScienceAI & Machine LearningMedium

Decision Tree (ML)

Also known as:Classification TreeRegression TreeCART

A decision tree is a supervised machine learning model that splits data into branches based on feature values, forming a tree structure where each internal node represents a feature test, each branch represents an outcome, and each leaf node holds a prediction. Trees are trained by choosing splits that maximise information gain or minimise Gini impurity at each step. They are highly interpretable and serve as the building block for ensemble methods like random forests and gradient boosting.

Key Formula

Information Gain = Entropy(S) - weighted sum of child entropies

LaTeX: IG(S, A) = H(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} H(S_v)

SymbolMeaningUnit
IG(S, A)Information gain of attribute A on dataset Sbits
H(S)Entropy of the dataset Sbits
|S_v|Number of samples where attribute A equals value vcount
|S|Total number of samples in Scount

Worked Example

Problem

A dataset of 10 emails has 6 spam and 4 not-spam. After splitting on "contains free", 4 emails land in the "yes" branch (all spam) and 6 in the "no" branch (2 spam, 4 not-spam). Calculate the information gain.

Solution

Step 1 — Parent entropy: H(S) = -(6/10)log₂(6/10) - (4/10)log₂(4/10) = 0.971 bits. Step 2 — Child entropy "yes" (4 spam, 0 not-spam): H = 0 bits (pure). Step 3 — Child entropy "no" (2 spam, 4 not-spam): H = -(2/6)log₂(2/6) - (4/6)log₂(4/6) = 0.918 bits. Step 4 — Weighted child entropy: (4/10)×0 + (6/10)×0.918 = 0.551 bits. Step 5 — IG = 0.971 - 0.551 = 0.420 bits.

Answer

Information Gain = 0.420 bits

Decision Tree Split Criteria Comparison

CriterionFormula BasisTaskRangeNotes
Gini Impurity1 - Σpᵢ²Classification[0, 0.5]Faster to compute
Entropy / Info Gain-Σpᵢlog₂pᵢClassification[0, 1]More balanced splits
Variance ReductionVar(parent) - Var(children)RegressionDomain-dependentUsed in CART
Chi-Squareχ² statisticClassification[0, ∞)Statistical significance

Interactive Tools

Scikit-learn Decision Trees

Open Tool

Brilliant.org Decision Trees

Open Tool

Google ML Crash Course

Open Tool
Decision tree diagram for predicting Titanic survivors using passenger features

Wikimedia Commons, CC BY-SA

Related Terms

Computer Science

Random Forest

A random forest is an ensemble machine learning algorithm that constructs a large number of decision trees during training and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Each tree is trained on a bootstrap sample of the data and uses a random subset of features at each split, introducing diversity that reduces variance and overfitting. Introduced by Leo Breiman in 2001, random forests are among the most widely used and robust general-purpose algorithms.

Computer Science

Feature Engineering

Feature engineering is the process of using domain knowledge to select, transform, or create input variables (features) from raw data to improve the performance of machine learning models. It bridges raw data and predictive algorithms by producing representations that algorithms can learn from more effectively. Techniques include normalization, one-hot encoding, polynomial feature creation, and dimensionality reduction.

Computer Science

Regularization (ML)

Regularization in machine learning refers to techniques that add a penalty term to the loss function to discourage model complexity, thereby reducing overfitting and improving generalisation to unseen data. The two most common forms are L1 (Lasso) regularization, which promotes sparsity by penalising the absolute values of weights, and L2 (Ridge) regularization, which penalises the squared values, shrinking all weights toward zero. Regularization is a fundamental concept in statistical learning theory, closely tied to the bias–variance trade-off.

The term adapts the mathematical "tree" graph concept to decision-making. Decision trees in statistics were formalised by Leo Breiman et al. in their 1984 book "Classification and Regression Trees" (CART). The word "decision" derives from Latin decidere (to cut off, determine).

decision-treeclassificationregressionsupervised-learninginterpretable-ml