EngineeringChemical & Computer EngineeringMedium

Computer Pipeline

Also known as:Instruction pipelineCPU pipelinePipelining

A computer pipeline is a hardware technique that overlaps the execution of multiple instructions by dividing instruction processing into discrete sequential stages — typically fetch, decode, execute, memory access, and write-back — so that each stage operates on a different instruction simultaneously, analogous to an assembly line. Pipelining increases instruction throughput (instructions completed per second) without reducing the time to complete a single instruction (latency), ideally executing one instruction per clock cycle at steady state. Pipeline performance is limited by hazards: structural hazards (resource conflicts), data hazards (dependency between instructions), and control hazards (branches altering instruction flow).

Key Formula

CPI_pipeline = 1 + stalls_per_instruction

LaTeX: CPI_{pipeline} = 1 + \text{stalls per instruction}

Symbol	Meaning	Unit
CPI_pipeline	Average clock cycles per instruction in a pipelined processor	cycles/instruction
1	Ideal CPI for a fully pipelined processor with no hazards	cycles
stalls_per_instruction	Average pipeline stall cycles introduced by hazards	cycles

Worked Example

Problem

A 5-stage pipeline runs at 2 GHz. A program has 10% branch instructions, each causing a 3-cycle stall (no branch prediction). Data hazards add 0.1 stall cycles per instruction on average. Calculate the effective CPI and throughput.

Solution

Step 1: Stalls per instruction = branch stalls + data stalls = (0.10 × 3) + 0.1 = 0.30 + 0.10 = 0.40 cycles/instruction. Step 2: CPI = 1 + 0.40 = 1.40 cycles/instruction. Step 3: Throughput = clock frequency / CPI = 2×10⁹ / 1.40 = 1.43×10⁹ instructions/second.

Answer

CPI = 1.40; effective throughput = ~1.43 billion instructions per second (GIPS), versus 2 GIPS for an ideal pipeline.

Classic 5-Stage RISC Pipeline Stages

Stage	Abbreviation	Operation Performed	Key Hardware	Hazard Risk
Instruction Fetch	IF	Read next instruction from memory using PC	PC register, instruction memory	Control hazard
Instruction Decode	ID	Decode opcode, read register file, sign-extend immediates	Register file, decoder	Data hazard (RAW)
Execute	EX	Perform ALU operation or compute address	ALU, forwarding unit	Data hazard (forwarded)
Memory Access	MEM	Read/write data memory for load/store instructions	Data cache	Structural hazard
Write Back	WB	Write ALU result or loaded data to register file	Register file write port	Data hazard (stale register)

Interactive Tools

Khan Academy — Pipelining

Open Tool

Wolfram Alpha — CPI Calculations

Open Tool

Brilliant.org — Processor Pipelines

Open Tool

Wikimedia Commons, CC BY-SA

Related Terms

Engineering

Microprocessor Architecture

Microprocessor architecture describes the internal organization and design of a microprocessor, including the arrangement of its arithmetic logic unit (ALU), control unit, registers, cache, buses, and instruction set, which collectively determine how the processor fetches, decodes, and executes instructions. Architectures are broadly classified as RISC (Reduced Instruction Set Computer) or CISC (Complex Instruction Set Computer), each with distinct trade-offs in instruction complexity, pipeline depth, and energy efficiency. Modern processors incorporate multiple cores, branch prediction, out-of-order execution, and deep cache hierarchies to maximize performance.

Engineering

Cache Memory

Cache memory is a small, high-speed memory layer placed between the processor and main memory (RAM) that stores copies of frequently accessed data and instructions to reduce average memory access latency. Modern processors use a multi-level cache hierarchy (L1, L2, L3), each level larger and slower than the previous, organized around the principles of temporal locality (recently used data will likely be reused) and spatial locality (nearby data will likely be accessed soon). Cache performance is measured by the hit rate — the fraction of memory requests satisfied by the cache — and miss penalty — the extra time needed to fetch data from a lower level.

Engineering

Boolean Logic Circuit

A Boolean logic circuit is a digital electronic circuit that performs logical operations on binary inputs (0 or 1) using combinations of fundamental logic gates — AND, OR, NOT, NAND, NOR, XOR, and XNOR — to produce a binary output defined by Boolean algebra. These circuits are the building blocks of all digital computers, forming the basis of arithmetic units, control logic, and memory elements. The behavior of any combinational logic circuit can be fully described by a Boolean expression or truth table.

From Old English/Old High German "pipa" (pipe, tube), extended metaphorically to any assembly-line process. "Pipeline" as a computing concept was introduced by IBM in the 1960s for the Stretch (7030) and System/360 Model 91 processors. The 5-stage RISC pipeline was formalized by Patterson and Hennessy in their influential computer architecture textbook (1990).

pipelinecpucpihazardscomputer-architectureinstruction-throughput