A hash table (also called a hash map) is a data structure that implements an associative array by using a hash function to map keys to array indices (buckets), enabling average-case O(1) time for insertion, deletion, and lookup operations. When multiple keys hash to the same index (a collision), the table resolves it via chaining (each bucket holds a linked list of entries) or open addressing (probing for an alternate empty slot). Hash tables underlie Python dictionaries, Java HashMaps, database indexing, caches, and symbol tables in compilers, making them one of the most widely used data structures in software engineering.
h(k) = k mod m
LaTeX: h(k) = k \bmod m
| Symbol | Meaning | Unit |
|---|---|---|
| h(k) | Hash index (bucket) for key k | dimensionless |
| k | Key (or its integer representation) | dimensionless |
| m | Number of buckets in the hash table (ideally prime) | dimensionless |
Problem
Insert keys {15, 27, 43, 36, 10} into a hash table with m = 7 buckets using h(k) = k mod 7. Use chaining for collision resolution. What is the load factor?
Solution
Step 1: Compute hash values: h(15) = 15 mod 7 = 1; h(27) = 27 mod 7 = 6; h(43) = 43 mod 7 = 1 (collision with 15); h(36) = 36 mod 7 = 1 (collision); h(10) = 10 mod 7 = 3. Step 2: Build table: Bucket 0: empty; Bucket 1: [15 → 43 → 36]; Bucket 2: empty; Bucket 3: [10]; Bucket 4: empty; Bucket 5: empty; Bucket 6: [27]. Step 3: Load factor α = n/m = 5/7 ≈ 0.714. Step 4: Average lookup in non-empty bucket = 1 + α/2 ≈ 1.357 comparisons.
Answer
Load factor α ≈ 0.714; Bucket 1 holds chain [15→43→36]; average lookup ≈ 1.36 comparisons
| Strategy | Method | Load Factor Limit | Cache Performance | Worst-Case Lookup |
|---|---|---|---|---|
| Chaining | Linked list per bucket | Any (α can exceed 1) | Poor (pointer chasing) | O(n) if all keys collide |
| Linear Probing | Probe next slot | α < 1 (≤ 0.7 recommended) | Excellent (contiguous) | O(n) clustering |
| Quadratic Probing | Probe slot +1², +2², ... | α < 0.5 for full coverage | Good | O(n) secondary clustering |
| Double Hashing | Second hash for step size | α < 1 | Moderate | O(n) theoretical |
| Robin Hood Hashing | Displace if current has lower psl | α < 0.9 practical | Good | O(log n) expected |
Wikimedia Commons, CC BY-SA
An array is a fundamental data structure that stores elements of the same type in contiguous memory locations, enabling O(1) random access via an integer index computed as base_address + index × element_size. Arrays are the basis for many higher-level structures (stacks, queues, heaps, hash tables) and are cache-efficient due to spatial locality. In advanced usage, multi-dimensional arrays, dynamic arrays (vectors), and circular arrays extend the basic model to support resizing, matrix operations, and efficient queue implementations.
A linked list is a linear data structure in which elements (nodes) are stored at arbitrary memory locations, with each node containing a data field and one or more pointers linking it to adjacent nodes. Unlike arrays, linked lists do not require contiguous memory, enabling O(1) insertion and deletion at known positions without shifting elements, but they sacrifice O(1) random access (requiring O(n) traversal) and incur pointer overhead per node. Variants include singly linked lists, doubly linked lists, circular linked lists, and skip lists, each trading memory for specific operation efficiency.
A binary search tree (BST) is a binary tree in which every node satisfies the property that all keys in its left subtree are less than the node's key and all keys in its right subtree are greater. This ordering property makes search, insertion, and deletion operations efficient, averaging O(log n) for balanced trees. BSTs are foundational in database indexing, symbol tables, and dynamic set operations.
The term "hash" in computing derives from the culinary meaning "to chop and mix," metaphorically describing how a hash function mixes the bits of a key to produce an index. The concept of hash tables was independently discovered by several researchers around 1953–1956, including Hans Peter Luhn at IBM who described it in a 1953 internal memo. The term "hashing" was popularised by Arnold Dumey in 1956 and formalized by Robert Morris in 1968.