Computer ScienceNetworking & SystemsMedium

Distributed Systems

Also known as:Distributed ComputingDistributed Architecture

A distributed system is a collection of independent computers that appear to users as a single coherent system, coordinating their actions by passing messages over a network to achieve a common computational goal. Key challenges in distributed systems include handling partial failures (where some nodes crash but others continue), achieving consensus on shared state, ensuring data consistency across replicas, and tolerating network partitions. Modern distributed systems underpin the internet's most critical infrastructure — from Google's search index and Amazon's shopping cart to financial clearing systems and distributed databases.

Core Challenges in Distributed Systems

Challenge	Description	Solution Approaches
Partial Failure	Some nodes crash while others run	Replication, heartbeats, circuit breakers
Network Partition	Network splits system into disconnected segments	CAP theorem trade-offs, quorum protocols
Consistency	All nodes see the same data	Consensus algorithms (Raft, Paxos)
Latency	Communication delays are unpredictable	Async messaging, eventual consistency
Clock Skew	Clocks on different nodes diverge	Logical clocks, vector clocks, NTP

Interactive Tools

MIT 6.824 Distributed Systems (Free Lectures)

Open Tool

Brilliant.org Computer Science

Open Tool

Codecademy Cloud and Systems

Open Tool

Diagram of a distributed system with multiple networked nodes communicating via message passing

Wikimedia Commons, CC BY-SA

Related Terms

Computer Science

CAP Theorem

The CAP Theorem, formulated by Eric Brewer in 2000, states that a distributed data system can simultaneously guarantee at most two of three properties: Consistency (every read receives the most recent write), Availability (every request receives a non-error response), and Partition Tolerance (the system continues operating despite network partitions that prevent some nodes from communicating). Since network partitions are an unavoidable reality in distributed systems, designers must choose between CP systems (consistent but potentially unavailable during partitions, like HBase) and AP systems (available but potentially returning stale data, like Cassandra). The CAP theorem is foundational to understanding trade-offs in distributed database design.

Computer Science

Microservices Architecture

Microservices architecture is a software design approach where an application is structured as a collection of small, independently deployable services, each responsible for a specific business capability and communicating over well-defined APIs. Unlike monolithic architectures where all components share a single codebase and runtime, microservices can be developed, deployed, and scaled independently using different programming languages and data stores. This pattern enables large engineering teams to work in parallel, improves fault isolation, and supports continuous delivery pipelines at scale.

Computer Science

Message Queue

A message queue is an asynchronous communication mechanism where a producer application sends messages to a buffer (the queue), and a consumer application retrieves and processes them independently, enabling decoupled communication between system components without requiring both parties to be available simultaneously. Message queues implement the producer-consumer pattern and support key enterprise integration patterns such as load leveling (absorbing traffic spikes), fan-out (broadcasting to multiple consumers), and retry logic (requeuing failed messages). Technologies like RabbitMQ, Apache Kafka, and AWS SQS are foundational to event-driven microservices architectures, enabling reliable asynchronous workflows such as order processing, email delivery, and data pipeline ingestion.

The concept of distributed computing emerged in the 1970s with ARPANET. "Distributed" derives from Latin "distribuere" (to divide and assign). Leslie Lamport's 1978 paper "Time, Clocks, and the Ordering of Events in a Distributed System" is considered foundational to the field.

distributed-systemsnetworkingfault-tolerancescalabilityconsensuscloud