EngineeringChemical & Computer EngineeringMedium

Cache Memory

Also known as:CPU cacheHigh-speed buffer memory

Cache memory is a small, high-speed memory layer placed between the processor and main memory (RAM) that stores copies of frequently accessed data and instructions to reduce average memory access latency. Modern processors use a multi-level cache hierarchy (L1, L2, L3), each level larger and slower than the previous, organized around the principles of temporal locality (recently used data will likely be reused) and spatial locality (nearby data will likely be accessed soon). Cache performance is measured by the hit rate — the fraction of memory requests satisfied by the cache — and miss penalty — the extra time needed to fetch data from a lower level.

Key Formula

T_avg = h × Tc + (1 − h) × Tm

LaTeX: T_{avg} = h \cdot T_c + (1-h) \cdot T_m

SymbolMeaningUnit
T_avgAverage memory access timens
hCache hit rate (fraction of accesses found in cache)dimensionless
TcCache access timens
TmMain memory access time on a missns

Worked Example

Problem

A processor has an L1 cache with access time Tc = 4 ns and main memory access time Tm = 80 ns. If the cache hit rate h = 0.92, calculate the average memory access time.

Solution

Step 1: Apply the formula: T_avg = h × Tc + (1 − h) × Tm. Step 2: Substitute values: T_avg = 0.92 × 4 + (1 − 0.92) × 80. Step 3: T_avg = 3.68 + 0.08 × 80 = 3.68 + 6.40 = 10.08 ns.

Answer

Average memory access time = 10.08 ns, much closer to cache speed (4 ns) than main memory (80 ns).

Cache Levels in Modern x86-64 Processors

Cache LevelTypical SizeAccess LatencyShared?Replacement Policy
L1-I (Instruction)32–64 KB per core4–5 cyclesNo (per core)LRU
L1-D (Data)32–64 KB per core4–5 cyclesNo (per core)LRU
L2 (Unified)256 KB–2 MB per core12–15 cyclesNo (per core)LRU or pseudo-LRU
L3 (Last Level Cache)4–64 MB total30–45 cyclesYes (all cores)Adaptive / QLRU
DRAM (Main Memory)4–512 GB200–300 cyclesYes (system-wide)N/A

Interactive Tools

Khan Academy — CPU Caches

Open Tool

Wolfram Alpha — Latency Calculations

Open Tool

Brilliant.org — Cache Memory

Open Tool
Diagram showing L1, L2, and L3 cache hierarchy between CPU cores and main memory

Wikimedia Commons, CC BY-SA

Related Terms

Engineering

Memory Hierarchy (computer)

The memory hierarchy in computer systems is a structured pyramid of storage levels organized by speed, cost, and capacity, where faster and more expensive memory (registers, cache) sits close to the processor and slower, cheaper, larger storage (RAM, SSD, HDD) resides farther away. The hierarchy exploits the principle of locality — programs tend to reuse recently accessed data (temporal locality) and access nearby memory addresses (spatial locality) — to make the average memory access time approach that of the fastest level. Effective hierarchy design is critical to bridging the speed gap between the processor and main memory.

Engineering

Microprocessor Architecture

Microprocessor architecture describes the internal organization and design of a microprocessor, including the arrangement of its arithmetic logic unit (ALU), control unit, registers, cache, buses, and instruction set, which collectively determine how the processor fetches, decodes, and executes instructions. Architectures are broadly classified as RISC (Reduced Instruction Set Computer) or CISC (Complex Instruction Set Computer), each with distinct trade-offs in instruction complexity, pipeline depth, and energy efficiency. Modern processors incorporate multiple cores, branch prediction, out-of-order execution, and deep cache hierarchies to maximize performance.

Engineering

Computer Pipeline

A computer pipeline is a hardware technique that overlaps the execution of multiple instructions by dividing instruction processing into discrete sequential stages — typically fetch, decode, execute, memory access, and write-back — so that each stage operates on a different instruction simultaneously, analogous to an assembly line. Pipelining increases instruction throughput (instructions completed per second) without reducing the time to complete a single instruction (latency), ideally executing one instruction per clock cycle at steady state. Pipeline performance is limited by hazards: structural hazards (resource conflicts), data hazards (dependency between instructions), and control hazards (branches altering instruction flow).

From French "cache" (a hiding place, storage), derived from "cacher" (to hide), from Latin "coacticare" (to compress, conceal). The term was applied to computer memory in the 1960s at IBM by Liptay and others who described a small, fast "buffer" hiding between the CPU and core memory. The word entered common computing vocabulary around 1968.

cachememoryhit-ratelocalitycomputer-architecturelatency