Consistent Hashing Explained for Engineers Who Operate Real Systems

Intermediate 10 min read

What you'll learn

✓Why modulo sharding breaks when nodes change
✓How a hash ring distributes keys with minimal movement
✓Why virtual nodes are non-negotiable
✓How to handle hot keys and skew
✓When consistent hashing is the wrong tool

Prerequisites

•Familiar with how APIs work
•Basic knowledge of hash functions

What and Why

Consistent hashing is a partitioning scheme that minimizes data movement when the set of nodes changes. It is the backbone of caches like Memcached clusters, datastores like Cassandra and DynamoDB, and CDNs that route requests to origin servers. If you have ever written node = hash(key) % N, you have already met its dumber cousin.

The motivation is operational. Modulo hashing remaps almost every key when N changes by one. Adding a single cache node invalidates roughly (N-1)/N of your entries. At scale, that is a thundering herd and a cold-cache incident. Consistent hashing makes adding or removing a node move only K/N keys on average.

Mental Model

Imagine a circle from 0 to 2^32. Hash each node id onto the circle. Hash each key onto the circle. To find which node owns a key, walk clockwise from the key’s position until you hit a node. That node owns the key.

When a node leaves, only the keys between it and its predecessor migrate, and they all go to its clockwise neighbor. When a node joins, it claims a contiguous arc from one existing node. No global reshuffle.

           0/2^32
            .
       N1 .       . K (key hashes here)
         .         .
        .           .  -> walks clockwise, lands on N2
        .           .
         .         .
       N3 .       . N2
            .

Hash ring with three nodes and one key

Architecture

A real implementation uses virtual nodes (vnodes). Each physical node is hashed to many points on the ring, often 100 to 500. Without vnodes, three physical nodes produce three arcs of wildly uneven size, and one node receives most of the load. With 200 vnodes per node, the law of large numbers smooths the distribution.

A client library typically does the routing. It keeps a sorted array of (hash, node) pairs and performs a binary search per key lookup. Updates to the ring are pushed by a coordination layer like ZooKeeper, etcd, or a gossip protocol.

Replication is layered on top. A common pattern: walk clockwise from the key, take the first R distinct physical nodes, replicate to all of them. This gives you R replicas with predictable placement and a clear preference list for reads.

Trade-offs

Consistent hashing is not free.

Skew is real. Even with vnodes, a single hot key still pins to a single node. Trending tweets, celebrity profiles, viral product pages. You solve hot keys with a different layer: replication, caching, or request coalescing.
Rebalancing is not instant. Moving K/N keys can still be terabytes. Datastores stream data slowly to avoid saturating the network. Plan for hours of degraded performance after a scale event.
Ring drift. If clients disagree on the ring composition, they route the same key to different nodes. You get split-brain caches and inconsistent reads. The coordination layer must be reliable and the propagation delay bounded.
Range queries die. Consistent hashing destroys key order. If you need range scans, use range partitioning instead and accept the rebalancing pain.
Tail latency. The node that owns a key might be slow or down. Strategies like hedged requests and “sloppy quorum” trade consistency for latency.

A subtle point: bounded-load consistent hashing (introduced by Google) caps how much load any node can take. If a node hits capacity, the next key spills to the next vnode. This eliminates the worst skew at the cost of weaker locality guarantees.

Practical Tips

Use a well-mixed hash function. MurmurHash3 or xxHash, not Java’s default hashCode. Cryptographic hashes are overkill and slower.
Tune vnodes per node. 100 to 500 is the usual range. More vnodes give smoother distribution at the cost of routing table size.
Hash the node id with a salt that includes the vnode index. hash(node_id + ":" + i). Keep it deterministic so any client computes the same ring.
Cache the sorted ring. Recompute only when membership changes. Binary search is O(log V) where V is total vnodes.
Plan for hot keys separately. Add a small in-process LRU in the client for very-hot keys, or shard a hot key across multiple ring positions with a suffix.
Don’t use it where it doesn’t fit. Stateless services route fine with random load balancing. Consistent hashing only pays off when each request has affinity to specific state.

A minimal Python sketch:

import bisect
from hashlib import md5

class Ring:
    def __init__(self, nodes, vnodes=200):
        self.ring = []
        for n in nodes:
            for i in range(vnodes):
                h = int(md5(f"{n}:{i}".encode()).hexdigest(), 16)
                bisect.insort(self.ring, (h, n))

    def get(self, key):
        h = int(md5(key.encode()).hexdigest(), 16)
        i = bisect.bisect(self.ring, (h,)) % len(self.ring)
        return self.ring[i][1]

This is enough to understand the mechanic. A production version handles weights, removal, and replication preference lists.

Wrap-up

Consistent hashing is a precise tool. It solves the specific problem of partitioning state across a changing set of nodes with minimal disruption. It does not solve hot keys, range queries, or coordination. Pair it with virtual nodes, a reliable membership service, and a hot-key strategy. When you next see a cache cluster scale smoothly from twelve nodes to twenty without an outage, this is the math underneath.