Skip to content
C Codeloom
System Design

CAP Theorem in Practice: What It Actually Means for Your System

A pragmatic look at the CAP theorem: what consistency and availability mean for real workloads, and how PACELC describes the trade-offs better.

·5 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • What CAP actually states (and what it does not)
  • Why CP and AP are choices made per-request, not per-system
  • How PACELC extends CAP for the no-partition case
  • How real databases position themselves
  • How to pick a consistency model for a feature

Prerequisites

  • Familiar with how APIs work
  • Basic database knowledge

What and Why

The CAP theorem says: during a network partition, a distributed data store must choose between consistency and availability. That is it. It is not a buffet where you pick two of three on a calm afternoon. It is a statement about behavior when nodes cannot talk to each other.

The reason this matters is that partitions are not exotic. A misconfigured firewall, a flaky switch, a noisy neighbor in a shared VPC, a region-to-region link saturating during a deploy. Partitions happen weekly in any large fleet. CAP forces you to decide, in advance, what your system does when it cannot agree.

Mental Model

Define the terms carefully because most arguments about CAP are arguments about definitions.

  • Consistency (C) in CAP means linearizability: every read returns the latest acknowledged write. This is stronger than the “C” in ACID.
  • Availability (A) means every non-failing node returns a non-error response. Not “the service is up”. A response that says “I am not sure, ask later” is unavailable in CAP terms.
  • Partition tolerance (P) means the system continues to operate despite messages being lost between nodes. In any real distributed system you cannot opt out of P, so the live choice is C versus A during a partition.

The honest framing: when partitioned, do you refuse writes to preserve a single source of truth, or do you accept writes on both sides and reconcile later?

           Network Partition
 +-------+ X X X X X X X +-------+
 | DC-A  |   no traffic  | DC-B  |
 +-------+               +-------+
     |                       |
 client wants to write     client wants to write
     |                       |
 CP: reject (no quorum)    AP: accept, reconcile later
A partition forces the choice

Architecture

In practice, CAP shows up as configuration knobs.

  • Quorum reads and writes. With N replicas, requiring W writes and R reads such that W + R > N gives linearizable behavior. Lose quorum, lose availability. DynamoDB and Cassandra expose this directly.
  • Leader election. Systems like etcd, ZooKeeper, and Spanner use consensus (Raft, Paxos) to elect a single writer. Lose contact with the leader, lose writes on the minority side. This is CP.
  • Multi-leader with conflict resolution. CouchDB, Cassandra with last-write-wins, CRDT-based stores. Both sides accept writes during a partition and merge afterward. This is AP.

PACELC is a sharper lens: if Partitioned, choose A or C; Else (normal operation), choose Latency or Consistency. Most workloads live in the “else” clause. Spanner is CP/EC (consistent at the cost of latency). DynamoDB is AP/EL by default but can be made EC with strong reads. Cassandra is AP/EL out of the box.

Trade-offs

The dirty secret of CAP is that the choice is rarely system-wide. It is per-request, per-table, sometimes per-field.

  • Strong consistency for money, eventual for likes. A payment ledger needs linearizable writes. A counter showing how many people liked a post does not. Mixing models inside one product is normal.
  • Read-your-writes is often enough. Full linearizability is expensive. Session-level guarantees (“this user sees their own writes immediately”) cover most UX needs at a fraction of the coordination cost.
  • CRDTs change the conversation. Conflict-free replicated data types let you accept writes everywhere and merge deterministically. Great for shopping carts, counters, presence. Bad for anything that needs a unique constraint.
  • Availability is not uptime. A CP system can be “up” 99.99% and still reject writes during a partition. To users that looks like a partial outage. Make sure your SLO captures what you actually care about.

The biggest mistake I see: treating CAP as a one-time architectural choice. Real systems mix CP and AP components and route requests to the right backend based on the feature’s tolerance.

Practical Tips

  1. Classify each feature by consistency need. Write a one-line policy: “checkout requires linearizable reads; product recommendations are eventual.”
  2. Use the strongest store that meets your latency budget. If a CP database fits, use it. Eventual consistency adds engineering tax everywhere downstream.
  3. Measure partition rate. If your inter-AZ link partitions for 30 seconds once a quarter, AP might be overkill. If you operate across continents, you will see partitions weekly.
  4. Design the degraded mode. What does the UI show when writes are rejected? A clear “saving failed, retry” message is better than silently dropping data.
  5. Beware split-brain reconciliation. AP systems require merge logic. If your merge is “last write wins”, you will lose data. Use vector clocks, CRDTs, or explicit conflict resolution.
  6. Document your guarantees. Tell your callers exactly what they get: linearizable, sequential, causal, eventual. Vagueness invites bugs at the boundary.

Wrap-up

CAP is a forcing function for honesty. It says you cannot pretend distributed systems behave like a single machine. Once you accept that partitions happen, you start designing for the choice instead of pretending it does not exist. Pick the strongest consistency you can afford for each feature, document what you promise, and design the failure mode as carefully as the happy path. The systems that survive at scale are the ones whose authors took CAP literally.