A simple guide to understanding CAP Theorem for Distributed Systems

Introduction

Distributed systems are a collection of autonomous computers that work together as a single system. These systems have become increasingly popular due to their ability to provide better performance, scalability, and fault-tolerance. However, ensuring consistency, availability, and partition tolerance in distributed systems can be a challenging task. In this article, we will discuss the CAP theorem, which provides a fundamental understanding of the trade-offs that need to be made when designing distributed systems. CAP theorem was formulated by Eric Brewer around the year 2000. This post is also available in YouTube here.

What is CAP Theorem?

CAP theorem states that in any distributed system, we can have at most two of the following desirable properties: consistency, availability, and partition tolerance. What it means is the system can either have consistency and availability (CA), availability and partition tolerance (AP), or consistency and partition tolerance (CP), but never all three things like C, A and B.

Terminologies in CAP Theorem

Before we dive further, let’s first understand some terminologies used in it.

Consistency – Consistency means any user should be able to see the most recent data written to the system, no matter which node they connect to.
Availability – Availability means every request, whether it’s a read or write, from any user should return a response. It may return stale data but will never return a not available error.
Partition Tolerance – Partition means communication break between the nodes in the system. Partition tolerance means the system continues to work in spite of the partition.

When does CAP Theorem hold good?

CAP theorem holds good only in systems that has data replicated across multiple nodes and those nodes are distributed across different networks. Also It holds good only when the network between the nodes breaks leading to replication failure. That is, the nodes themselves are functioning, but only the duplication between some of those nodes are broken.

Example to Understand CAP Theorem

Let’s consider an example to understand CAP theorem better. Below picture shows a rough design of a distributed system with peer-to-peer replication. The nodes DB1, DB2, and DB3 are capable of accepting writes and replicating those writes to other nodes. The blue dotted lines indicate the network through which the replication occurs. Note that CAP theorem holds good only when the network between the databases used for replication breaks (the blue dotted lines).

Conclusion

In conclusion, CAP theorem provides a fundamental understanding of the trade-offs that need to be made when designing distributed systems. It simplifies to when a network partition occurs in a distributed database system, it can either be consistent or available but cannot be both consistent and available. Therefore, it’s important to consider the CAP theorem when designing and implementing distributed systems to ensure they meet the necessary requirements of consistency, availability, and partition tolerance.