System Design: Design a Distributed Cache (Redis-like)

Intermediate 11 min read

What you'll learn

✓Shard keys with consistent hashing
✓Pick a replication and consistency model
✓Implement eviction policies cleanly
✓Decide whether to persist or stay pure cache
✓Handle node failures without thundering herds

Prerequisites

•Familiarity with hash maps and TTLs.
•Comfort with networked services. See [Big O Notation Explained](/blog/big-o-notation-explained).

A distributed cache is the system that sits between your app and your database and stops you from going bankrupt. Redis and Memcached are the templates. Designing one yourself forces you to think about partitioning, replication, eviction, and what happens when a node dies during a key write.

Functional Requirements

Get, set, delete by key.
TTL per key.
Atomic increment.
Optional data structures (lists, sets, hashes).
Pub/sub channels (optional).

Non-Functional Requirements

Read QPS: 1M per cluster, scaling linearly with nodes.
Latency: p99 under 1 ms within a datacenter.
Memory per node: 64 to 256 GB.
Durability: best-effort by default, optional snapshot or AOF.
Availability: 99.99 percent with replication.

High-Level Architecture

Client library hashes the key and routes the request to the owning shard.
Each shard has a primary and one or more replicas.
A cluster manager (or gossip protocol) tracks node membership, shard ownership, and failover.
Optional persistence layer writes snapshots or an append-only log to disk and to object storage.

The data plane is the cache. The control plane is the membership and topology.

Data Model

The on-node store is a hash map from key to value plus metadata:

entry {
  key:        bytes
  value:      bytes
  expires_at: uint64 (0 = no expiry)
  lru_link:   doubly linked list pointer
  size:       uint32
}

A separate eviction structure (LRU linked list, or an approximation like Redis’s sampled LRU) tracks access order. A min-heap or sorted set tracks expirations.

Key APIs

GET  key                -> value | nil
SET  key value [EX ttl]
DEL  key
INCR key                -> new int value
EXPIRE key ttl

Pipelining and multi-key commands let clients batch round trips, which matters more for throughput than any single optimization.

Partitioning

Consistent hashing. Hash the key into a 2^32 ring. Each node owns a contiguous arc. Adding or removing a node only moves the keys on the neighbor arcs, not the entire keyspace. Use virtual nodes (each physical node claims 128 to 256 points on the ring) to balance load.

Hash slot (Redis-style). Map every key to one of 16384 slots via CRC16, and assign slot ranges to nodes. Simpler resharding because slots are first-class units that move atomically.

Both work. Hash slots are easier to operate; consistent hashing is easier to scale to thousands of nodes.

Replication and Consistency

Async replication. Primary acks the write immediately and ships to replicas in the background. Fast, but a primary crash loses the last few milliseconds of writes.

Sync replication. Primary waits for at least one replica to ack. Slower writes, no data loss on single-node failure.

Quorum. For stronger guarantees, require a majority of N replicas to ack. Costly for cache workloads — usually overkill.

For a cache, async with one or two replicas is the right default. Cache is supposed to be lossy.

Eviction

When memory is full, evict. Common policies:

LRU. Evict least recently used. Best for skewed workloads.
LFU. Evict least frequently used. Better when the working set has stable popularity.
Random. Cheap and surprisingly competitive.
TTL-based. Evict the soonest-to-expire.

Redis uses sampled LRU and LFU — pick K random keys and evict the oldest. This avoids maintaining a true LRU list for every operation, which is the right cost tradeoff.

Persistence

Pure cache: no persistence. Restart means warm-up from the source of truth.

Snapshot: fork and write a memory image to disk every N minutes. Cheap, with a small RPO.

AOF (append-only file): log every write. Higher durability, higher write amplification.

Most teams pick snapshot plus AOF on the primary, and rely on replicas as the realtime backup.

Scaling and Tradeoffs

Client-side hashing vs proxy. Client-side is faster — no extra hop — but harder to upgrade. A proxy (twemproxy, mcrouter, Redis Cluster proxy) gives a single endpoint at the cost of a hop.

Hot keys. A single hot key can swamp a node. Mitigations: replicate the hot key to multiple nodes, add a small in-process LRU on the client, or shard the key by suffixing a random bucket.

Thundering herd. When a hot key expires, every requester misses at once and stampedes the database. Add request coalescing (one missed request fills the cache, others wait) or use a SETNX lock during regeneration.

Multi-DC. Replicate async across regions for read locality. Writes go to the home region. Conflict resolution is usually last-write-wins for a cache.

Eviction storms. A sudden memory pressure spike can evict half the working set. Provision headroom — target 70 percent fill.

Failover. Replica is promoted on primary failure via sentinel or a Raft-based controller. Clients refresh topology and retry. Brief unavailability is acceptable.

For deployment patterns, the cluster fits naturally on Kubernetes with a StatefulSet per shard group — see What is Kubernetes. For managed alternatives, see What is AWS and ElastiCache.

What to Say in an Interview

Pick consistent hashing or hash slots and justify your choice. Saying “Redis uses 16384 hash slots” signals you have read the source.
State your replication mode (async with one replica) and admit the data loss window.
Cover hot keys and thundering herds explicitly. These are the two failure modes that bring down real caches.
Mention sampled LRU rather than true LRU as a real-world implementation choice.
Be explicit: this is a cache. It is allowed to lose data.

Wrap up

A distributed cache is a sharded hash map with replication, eviction, and a control plane to track membership. Use consistent hashing or hash slots, async replication, sampled LRU, and accept the durability tradeoffs. The hard problems are hot keys, herd protection, and failover — solve those and you have built Redis from scratch.