Computer ScienceData StructuresMedium

Bloom Filter

Also known as:Probabilistic setApproximate membership structure

A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It can yield false positives (reporting an element is present when it is not) but never false negatives (it will never miss a true member). It works by hashing elements into a bit array using multiple hash functions. Bloom filters are used in databases, web caches, network routers, and spell checkers where memory efficiency is critical and occasional false positives are tolerable.

Key Formula

P_fp = (1 - e^(-k*n/m))^k

LaTeX: P_{fp} = \left(1 - e^{-kn/m}\right)^k

Symbol	Meaning	Unit
P_fp	Probability of a false positive	dimensionless (0 to 1)
k	Number of hash functions	count
n	Number of elements inserted	count
m	Size of the bit array (number of bits)	bits
e	Euler's number (≈ 2.718)	dimensionless

Worked Example

Problem

A Bloom filter uses m = 1000 bits, k = 7 hash functions, and has n = 100 elements inserted. What is the probability of a false positive?

Solution

Step 1: Identify variables: m = 1000, k = 7, n = 100. Step 2: Compute the exponent: -k*n/m = -(7 × 100) / 1000 = -700 / 1000 = -0.7. Step 3: Compute e^(-0.7) ≈ 0.4966. Step 4: Compute 1 - e^(-0.7) = 1 - 0.4966 = 0.5034. Step 5: Raise to the power k: (0.5034)^7 ≈ 0.00819.

Answer

False positive probability ≈ 0.82% (approximately 1 in 122 queries will be a false positive)

Bloom Filter vs Hash Set: Trade-offs

Property	Bloom Filter	Hash Set	Notes
Space per element	Very low (~10 bits)	High (pointer + data)	Bloom filter wins for large n
False positives	Possible (tunable)	None	Trade-off for space savings
False negatives	Never	Never	Both are reliable for misses
Deletion support	No (standard)	Yes	Counting Bloom filter adds deletion
Lookup time	O(k) — very fast	O(1) amortized	k is small constant

Interactive Tools

Bloom Filter Calculator

Open Tool

Brilliant.org — Bloom Filters

Open Tool

WolframAlpha — Probability Calculations

Open Tool

Wikimedia Commons, CC BY-SA

Related Terms

Computer Science

Trie (data structure)

A trie (also called a prefix tree or digital tree) is a tree-like data structure used to store strings where each node represents a single character of a key, and paths from root to leaf spell out complete keys. The root represents an empty string, and each edge corresponds to a character. Tries are highly efficient for prefix-based search operations, making them ideal for autocomplete systems, spell checkers, and IP routing tables.

Computer Science

Disjoint Set

A disjoint set (also called union-find or merge-find set) is a data structure that keeps track of a collection of non-overlapping sets and supports two primary operations: Union (merging two sets) and Find (determining which set an element belongs to by returning a representative). With path compression and union by rank optimizations, operations run in nearly O(1) amortized time per operation (Inverse Ackermann function). Disjoint sets are foundational in Kruskal's minimum spanning tree algorithm, network connectivity checking, and image segmentation.

Computer Science

Graph (computer science)

A graph is a non-linear data structure consisting of a finite set of vertices (nodes) and a set of edges that connect pairs of vertices. Graphs can be directed (edges have a direction) or undirected, and weighted (edges carry a cost) or unweighted. They are used to model real-world problems such as social networks, transportation systems, web page link structures, and dependency resolution.

The Bloom filter was conceived by Burton Howard Bloom in 1970, and is named after him. The word "filter" reflects its function of filtering membership queries. Bloom published it in a paper titled "Space/Time Trade-offs in Hash Coding with Allowable Errors" in the Communications of the ACM.

bloom-filterprobabilistichashingspace-efficientdata-structuremembership-testing