System Design: Distributed Cache from Scratch

Intermediate 10 min read

What you'll learn

✓Why distributed caches exist
✓Consistent hashing for sharding
✓Replication and failover
✓Eviction policies and TTLs
✓How to deal with cache stampedes

Prerequisites

•Familiar with HTTP and databases

What and Why

Reads dominate most workloads. A cache holds hot data in memory so reads skip the slower database. A distributed cache spreads that memory across many machines so you can store more data and serve more traffic than any single host could.

Memcached and Redis are the household names. The interesting design questions are about partitioning, replication, and what happens when nodes come and go.

Mental Model

Think of the cache as a giant hash map split across N shards. A client picks the shard for a key, sends the request directly, and gets a value back. The job of the design is to make that mapping stable as N changes, to keep data alive when a machine dies, and to choose what to evict when memory fills up.

Architecture

Consistent hashing maps keys and nodes to points on a ring. A key belongs to the next node clockwise. Adding or removing a node only re-shuffles keys near that node, not the whole keyspace. Virtual nodes smooth out load imbalance.

         [Key A]
          |
 Node1 ---+--- Node2
  /              \
[Key D]          [Key B]
  \              /
 Node4 ---+--- Node3
          |
       [Key C]

Consistent hashing ring

Each shard runs primary plus replica nodes. Writes go to the primary; replicas catch up asynchronously. On primary failure, a coordinator promotes a replica. Many production systems use Raft or a watcher service like Zookeeper for this.

Clients use a smart library that knows the ring layout and routes directly to the owning node. A small proxy tier can hide the ring from clients at the cost of one extra hop.

Trade-offs

Strong consistency is expensive. Most caches accept eventual consistency on the replica path and last-write-wins on conflicts. If you need stricter guarantees, use the database, not the cache.

LRU eviction is intuitive but vulnerable to scans that touch many cold keys. LFU and TinyLFU resist scan pollution but need extra bookkeeping. Mixed workloads often pick LRU plus admission policies.

TTL-only caches are simple but invite stampedes when a hot key expires. Adding a small random jitter to TTLs avoids synchronized expirations.

Practical Tips

Always pair the cache with a clear policy: read-through, write-through, or cache-aside. Cache-aside is the most common because it keeps the database as the source of truth and the cache as an optimization.

Defend against thundering herds with single-flight: when a key misses, only one request fetches from the database while others wait.

Monitor hit rate, eviction rate, and tail latencies. A high hit rate with a high eviction rate often means your working set is bigger than memory; consider adding nodes.

Never use the cache as your only store. If it goes down, the service should degrade in performance, not lose data.

Wrap-up

A distributed cache is a study in trade-offs: latency versus durability, simplicity versus consistency, memory versus accuracy. Start with consistent hashing and cache-aside. Add replication and admission policies as load grows. The result is a small, sharp tool that quietly carries most of your read traffic.