Database Sharding Explained: Keys, Strategies, and Trade-offs

Beginner 11 min read

What you'll learn

✓Define sharding and how it differs from replication
✓Compare range, hash, directory, and geo-based partitioning
✓Pick a good shard key for your access patterns
✓Recognize and mitigate hot shards
✓Plan resharding strategies that minimize downtime

Prerequisites

•Basic familiarity with relational databases

Sharding is what you reach for when a single database server cannot hold your data or handle your traffic. Conceptually it is simple — split the data across many machines — but the choice of partitioning scheme and shard key shapes every operational decision that follows.

Sharding Versus Replication

Replication makes multiple copies of the same data on different machines, primarily for read scalability and fault tolerance. Sharding splits the data so that each machine holds a different slice, primarily for write scalability and total capacity.

You usually want both. Production systems shard horizontally and then replicate each shard for durability. A common configuration is sixteen shards with three replicas each, for forty-eight database processes total.

Choosing a Shard Key

The shard key decides which row goes to which shard. The ideal key has three properties:

High cardinality so the data spreads evenly.
Stable so a row never has to move shards.
Aligned with your access patterns so single-shard queries serve most reads.

Picking the shard key is the single most consequential decision in a sharded design. Get it right and growth is smooth. Get it wrong and you will be migrating data under load.

Range-Based Sharding

In range sharding, you assign contiguous ranges of the key to each shard. For example, user IDs 1 through 1,000,000 live on shard 1, and IDs 1,000,001 through 2,000,000 live on shard 2.

The good news: range scans are efficient because they stay within a small number of shards. The bad news: skewed keys produce hot shards. If user IDs are monotonically increasing, all new writes land on the last shard.

Range sharding fits well for time-series data when you also rotate write targets, like one shard per month with a write head pointer.

Hash-Based Sharding

Hash sharding applies a hash function to the shard key and then takes the result modulo the number of shards.

shard_id = hash(user_id) % num_shards

This distributes data uniformly, which is the main draw. The downside is that range scans become expensive — adjacent keys land on different shards, so a “list users in this date range” query has to fan out everywhere.

A serious operational issue is resharding. If you change num_shards, most keys move, which is catastrophic for a live system. Consistent hashing solves this by mapping shards onto a virtual ring; adding or removing a shard only moves a fraction of the data.

Directory-Based Sharding

In directory sharding, a lookup service tells you which shard owns each key. The mapping is explicit and can be arbitrarily complex.

This is the most flexible scheme: you can move individual keys, group customers onto shards by SLA, or split a hot tenant into its own shard. The trade-off is that the directory becomes a critical dependency that must itself be scaled and made highly available.

Many large multi-tenant SaaS products use directory sharding, with the tenant ID looked up to find their shard.

Geo-Based Sharding

Geo sharding partitions by region or country. European users go to a European shard, US users to a US shard, and so on. This is often required for data residency laws.

The downsides are familiar by now: cross-region queries are slow, and traffic skews between regions create unbalanced load. Mitigation usually involves further hash sharding within each region.

Hot Shards and How to Mitigate Them

A hot shard is one that takes disproportionate load. Causes include:

A skewed shard key, such as a celebrity user with millions of followers.
Time-based keys that always write to the newest shard.
A popular tenant in a multi-tenant system.

Mitigation strategies:

Hash the key with a salt to scatter writes.
Split the hot entity across multiple shards using a sub-key, sometimes called “sharded counters.”
Move the hot tenant to its own dedicated shard via directory sharding.
Add a write-through cache to absorb traffic spikes.

The earlier you detect a hot shard, the cheaper it is to fix. Monitor per-shard CPU, query rate, and storage growth.

Cross-Shard Queries

Some queries inherently span shards: “count active users globally” or “top 10 products by revenue.” Options include:

Scatter-gather: send the query to every shard, then aggregate in the application layer.
Materialized aggregates: precompute the result and store it in a small global table.
Read-only replica with denormalized data for analytical queries.

Treat cross-shard queries as expensive by default. If they appear in your hot path, your shard key is probably wrong.

Resharding Without Downtime

You will eventually outgrow your initial shard count. A standard playbook:

Double-write to the old shard and the new shard layout.
Backfill the new layout in the background from a snapshot.
Read from the new layout, validating against the old.
Cut over reads to the new layout once parity is confirmed.
Stop writing to the old layout and decommission.

Consistent hashing reduces the amount of data you need to backfill. Tools like Vitess automate much of this for MySQL.

Transactions Across Shards

Distributed transactions across shards are slow and complex. Two-phase commit is the textbook answer but locks resources on every shard for the full transaction window. Most production systems avoid cross-shard transactions by:

Co-locating related data on the same shard via a parent key.
Using sagas (compensating actions) instead of strict ACID guarantees.
Accepting eventual consistency for non-critical relationships.

If your business model requires strict cross-entity transactions across many shards, you may need to rethink the sharding boundary itself.

Wrapping Up

Sharding turns a single database scaling problem into many smaller problems plus a coordination problem. The discipline that pays off is picking a shard key that matches your reads, monitoring for skew, and treating resharding as a normal — if rare — operation rather than a catastrophe. Master those three habits and your sharded system can scale for years before requiring a rethink.