A disjoint set (also called union-find or merge-find set) is a data structure that keeps track of a collection of non-overlapping sets and supports two primary operations: Union (merging two sets) and Find (determining which set an element belongs to by returning a representative). With path compression and union by rank optimizations, operations run in nearly O(1) amortized time per operation (Inverse Ackermann function). Disjoint sets are foundational in Kruskal's minimum spanning tree algorithm, network connectivity checking, and image segmentation.
| Strategy | Find Complexity | Union Complexity | Space |
|---|---|---|---|
| Naive (no optimization) | O(n) | O(1) | O(n) |
| Union by Rank | O(log n) | O(log n) | O(n) |
| Path Compression | O(log n) amortized | O(log n) | O(n) |
| Both (Rank + Path Compression) | O(α(n)) ≈ O(1) | O(α(n)) ≈ O(1) | O(n) |
Wikimedia Commons, CC BY-SA
A graph is a non-linear data structure consisting of a finite set of vertices (nodes) and a set of edges that connect pairs of vertices. Graphs can be directed (edges have a direction) or undirected, and weighted (edges carry a cost) or unweighted. They are used to model real-world problems such as social networks, transportation systems, web page link structures, and dependency resolution.
A binary tree is a hierarchical data structure in which each node has at most two children, referred to as the left child and the right child. It is one of the most fundamental structures in computer science, forming the basis for more specialized trees like binary search trees and heaps. Binary trees are widely used in expression parsing, Huffman coding, and database indexing.
A heap is a specialized tree-based data structure that satisfies the heap property: in a max-heap, each parent node is greater than or equal to its children; in a min-heap, each parent is less than or equal to its children. Heaps are typically implemented as complete binary trees stored in arrays, enabling O(1) access to the minimum or maximum element. They are the underlying structure of priority queues and are used in heap sort and Dijkstra's shortest path algorithm.
The term "disjoint set" derives from mathematical set theory, where "disjoint" (from Latin "disjunctus," separated) refers to sets with no elements in common. The alternative name "union-find" describes the two primary operations it supports. The data structure was developed by Bernard Galler and Michael Fischer in 1964.