Consistent Hashing Explained for Engineers Who Operate Real Systems
How consistent hashing actually works in production: virtual nodes, rebalancing, hot keys, and why naive modulo hashing fails at scale.
What you'll learn
- ✓Why modulo sharding breaks when nodes change
- ✓How a hash ring distributes keys with minimal movement
- ✓Why virtual nodes are non-negotiable
- ✓How to handle hot keys and skew
- ✓When consistent hashing is the wrong tool
Prerequisites
- •Familiar with how APIs work
- •Basic knowledge of hash functions
What and Why
Consistent hashing is a partitioning scheme that minimizes data movement when the set of nodes changes. It is the backbone of caches like Memcached clusters, datastores like Cassandra and DynamoDB, and CDNs that route requests to origin servers. If you have ever written node = hash(key) % N, you have already met its dumber cousin.
The motivation is operational. Modulo hashing remaps almost every key when N changes by one. Adding a single cache node invalidates roughly (N-1)/N of your entries. At scale, that is a thundering herd and a cold-cache incident. Consistent hashing makes adding or removing a node move only K/N keys on average.
Mental Model
Imagine a circle from 0 to 2^32. Hash each node id onto the circle. Hash each key onto the circle. To find which node owns a key, walk clockwise from the key’s position until you hit a node. That node owns the key.
When a node leaves, only the keys between it and its predecessor migrate, and they all go to its clockwise neighbor. When a node joins, it claims a contiguous arc from one existing node. No global reshuffle.
0/2^32
.
N1 . . K (key hashes here)
. .
. . -> walks clockwise, lands on N2
. .
. .
N3 . . N2
. Architecture
A real implementation uses virtual nodes (vnodes). Each physical node is hashed to many points on the ring, often 100 to 500. Without vnodes, three physical nodes produce three arcs of wildly uneven size, and one node receives most of the load. With 200 vnodes per node, the law of large numbers smooths the distribution.
A client library typically does the routing. It keeps a sorted array of (hash, node) pairs and performs a binary search per key lookup. Updates to the ring are pushed by a coordination layer like ZooKeeper, etcd, or a gossip protocol.
Replication is layered on top. A common pattern: walk clockwise from the key, take the first R distinct physical nodes, replicate to all of them. This gives you R replicas with predictable placement and a clear preference list for reads.
Trade-offs
Consistent hashing is not free.
- Skew is real. Even with vnodes, a single hot key still pins to a single node. Trending tweets, celebrity profiles, viral product pages. You solve hot keys with a different layer: replication, caching, or request coalescing.
- Rebalancing is not instant. Moving K/N keys can still be terabytes. Datastores stream data slowly to avoid saturating the network. Plan for hours of degraded performance after a scale event.
- Ring drift. If clients disagree on the ring composition, they route the same key to different nodes. You get split-brain caches and inconsistent reads. The coordination layer must be reliable and the propagation delay bounded.
- Range queries die. Consistent hashing destroys key order. If you need range scans, use range partitioning instead and accept the rebalancing pain.
- Tail latency. The node that owns a key might be slow or down. Strategies like hedged requests and “sloppy quorum” trade consistency for latency.
A subtle point: bounded-load consistent hashing (introduced by Google) caps how much load any node can take. If a node hits capacity, the next key spills to the next vnode. This eliminates the worst skew at the cost of weaker locality guarantees.
Practical Tips
- Use a well-mixed hash function. MurmurHash3 or xxHash, not Java’s default
hashCode. Cryptographic hashes are overkill and slower. - Tune vnodes per node. 100 to 500 is the usual range. More vnodes give smoother distribution at the cost of routing table size.
- Hash the node id with a salt that includes the vnode index.
hash(node_id + ":" + i). Keep it deterministic so any client computes the same ring. - Cache the sorted ring. Recompute only when membership changes. Binary search is O(log V) where V is total vnodes.
- Plan for hot keys separately. Add a small in-process LRU in the client for very-hot keys, or shard a hot key across multiple ring positions with a suffix.
- Don’t use it where it doesn’t fit. Stateless services route fine with random load balancing. Consistent hashing only pays off when each request has affinity to specific state.
A minimal Python sketch:
import bisect
from hashlib import md5
class Ring:
def __init__(self, nodes, vnodes=200):
self.ring = []
for n in nodes:
for i in range(vnodes):
h = int(md5(f"{n}:{i}".encode()).hexdigest(), 16)
bisect.insort(self.ring, (h, n))
def get(self, key):
h = int(md5(key.encode()).hexdigest(), 16)
i = bisect.bisect(self.ring, (h,)) % len(self.ring)
return self.ring[i][1]
This is enough to understand the mechanic. A production version handles weights, removal, and replication preference lists.
Wrap-up
Consistent hashing is a precise tool. It solves the specific problem of partitioning state across a changing set of nodes with minimal disruption. It does not solve hot keys, range queries, or coordination. Pair it with virtual nodes, a reliable membership service, and a hot-key strategy. When you next see a cache cluster scale smoothly from twelve nodes to twenty without an outage, this is the math underneath.
Related articles
- System Design CAP Theorem in Practice: What It Actually Means for Your System
A pragmatic look at the CAP theorem: what consistency and availability mean for real workloads, and how PACELC describes the trade-offs better.
- System Design Designing Rate Limiters: A System Design Deep Dive
A senior-engineer guide to designing rate limiters: algorithms, distributed coordination, trade-offs, and production patterns that actually scale.
- System Design Distributed Locks with Redis: What Works, What Breaks
A practical look at distributed locking with Redis: SET NX EX, Redlock, fencing tokens, and the failure modes that cause data corruption.
- System Design System Design: Distributed Cache from Scratch
Design a distributed cache using consistent hashing, replication, and eviction policies to speed up reads at internet scale.