System Design: Distributed Cache from Scratch
Design a distributed cache using consistent hashing, replication, and eviction policies to speed up reads at internet scale.
What you'll learn
- ✓Why distributed caches exist
- ✓Consistent hashing for sharding
- ✓Replication and failover
- ✓Eviction policies and TTLs
- ✓How to deal with cache stampedes
Prerequisites
- •Familiar with HTTP and databases
What and Why
Reads dominate most workloads. A cache holds hot data in memory so reads skip the slower database. A distributed cache spreads that memory across many machines so you can store more data and serve more traffic than any single host could.
Memcached and Redis are the household names. The interesting design questions are about partitioning, replication, and what happens when nodes come and go.
Mental Model
Think of the cache as a giant hash map split across N shards. A client picks the shard for a key, sends the request directly, and gets a value back. The job of the design is to make that mapping stable as N changes, to keep data alive when a machine dies, and to choose what to evict when memory fills up.
Architecture
Consistent hashing maps keys and nodes to points on a ring. A key belongs to the next node clockwise. Adding or removing a node only re-shuffles keys near that node, not the whole keyspace. Virtual nodes smooth out load imbalance.
[Key A]
|
Node1 ---+--- Node2
/ \
[Key D] [Key B]
\ /
Node4 ---+--- Node3
|
[Key C] Each shard runs primary plus replica nodes. Writes go to the primary; replicas catch up asynchronously. On primary failure, a coordinator promotes a replica. Many production systems use Raft or a watcher service like Zookeeper for this.
Clients use a smart library that knows the ring layout and routes directly to the owning node. A small proxy tier can hide the ring from clients at the cost of one extra hop.
Trade-offs
Strong consistency is expensive. Most caches accept eventual consistency on the replica path and last-write-wins on conflicts. If you need stricter guarantees, use the database, not the cache.
LRU eviction is intuitive but vulnerable to scans that touch many cold keys. LFU and TinyLFU resist scan pollution but need extra bookkeeping. Mixed workloads often pick LRU plus admission policies.
TTL-only caches are simple but invite stampedes when a hot key expires. Adding a small random jitter to TTLs avoids synchronized expirations.
Practical Tips
Always pair the cache with a clear policy: read-through, write-through, or cache-aside. Cache-aside is the most common because it keeps the database as the source of truth and the cache as an optimization.
Defend against thundering herds with single-flight: when a key misses, only one request fetches from the database while others wait.
Monitor hit rate, eviction rate, and tail latencies. A high hit rate with a high eviction rate often means your working set is bigger than memory; consider adding nodes.
Never use the cache as your only store. If it goes down, the service should degrade in performance, not lose data.
Wrap-up
A distributed cache is a study in trade-offs: latency versus durability, simplicity versus consistency, memory versus accuracy. Start with consistent hashing and cache-aside. Add replication and admission policies as load grows. The result is a small, sharp tool that quietly carries most of your read traffic.
Related articles
- System Design Distributed Locks with Redis: What Works, What Breaks
A practical look at distributed locking with Redis: SET NX EX, Redlock, fencing tokens, and the failure modes that cause data corruption.
- System Design System Design: Token Bucket Rate Limiter
Design a distributed rate limiter using the token bucket algorithm with Redis, handling bursty traffic while protecting backend services.
- System Design CAP Theorem in Practice: What It Actually Means for Your System
A pragmatic look at the CAP theorem: what consistency and availability mean for real workloads, and how PACELC describes the trade-offs better.
- System Design Consistent Hashing Explained for Engineers Who Operate Real Systems
How consistent hashing actually works in production: virtual nodes, rebalancing, hot keys, and why naive modulo hashing fails at scale.