NoSQL databases are non-relational data storage systems that provide flexible schemas and horizontal scalability by abandoning the traditional table-row-column model of SQL databases in favour of document, key-value, column-family, or graph data models. They were designed to address the limitations of relational databases when handling massive volumes of unstructured or semi-structured data, high write throughput, and distributed deployments across commodity hardware. Popular NoSQL databases like MongoDB, Redis, Cassandra, and Neo4j are widely used in web-scale applications, real-time analytics, caching layers, and social network graphs.
| Type | Data Model | Example DB | Best Use Case |
|---|---|---|---|
| Document | JSON/BSON documents | MongoDB | Content management, user profiles |
| Key-Value | Key → Value pairs | Redis, DynamoDB | Caching, session storage |
| Column-Family | Column families | Cassandra, HBase | Time-series, write-heavy workloads |
| Graph | Nodes and edges | Neo4j, ArangoDB | Social networks, recommendation engines |
| Search Engine | Inverted index | Elasticsearch | Full-text search, log analytics |
Wikimedia Commons, CC BY-SA
SQL (Structured Query Language) is a domain-specific language used for managing, querying, and manipulating data stored in relational database management systems (RDBMS), which organize data into tables with rows and columns linked by foreign keys. SQL provides four main classes of operations — Data Query Language (SELECT), Data Manipulation Language (INSERT, UPDATE, DELETE), Data Definition Language (CREATE, ALTER, DROP), and Data Control Language (GRANT, REVOKE). It remains the most widely used database language in industry, underpinning systems from small web applications to enterprise data warehouses handling petabytes of data.
The CAP Theorem, formulated by Eric Brewer in 2000, states that a distributed data system can simultaneously guarantee at most two of three properties: Consistency (every read receives the most recent write), Availability (every request receives a non-error response), and Partition Tolerance (the system continues operating despite network partitions that prevent some nodes from communicating). Since network partitions are an unavoidable reality in distributed systems, designers must choose between CP systems (consistent but potentially unavailable during partitions, like HBase) and AP systems (available but potentially returning stale data, like Cassandra). The CAP theorem is foundational to understanding trade-offs in distributed database design.
A distributed system is a collection of independent computers that appear to users as a single coherent system, coordinating their actions by passing messages over a network to achieve a common computational goal. Key challenges in distributed systems include handling partial failures (where some nodes crash but others continue), achieving consensus on shared state, ensuring data consistency across replicas, and tolerating network partitions. Modern distributed systems underpin the internet's most critical infrastructure — from Google's search index and Amazon's shopping cart to financial clearing systems and distributed databases.
The term "NoSQL" was first used in 1998 by Carlo Strozzi for his lightweight relational database that did not use SQL. It was repopularised in 2009 by Johan Oskarsson for a meetup about distributed, non-relational databases. It is often retroactively interpreted as "Not Only SQL" to clarify that SQL may still be used alongside.