Skip to content
C Codeloom
Backend

Database Sharding Strategies Explained

Compare range, hash, and directory-based sharding strategies, with guidance on choosing shard keys and operating sharded systems.

·3 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • Why sharding becomes necessary
  • Range vs hash vs directory sharding
  • How to pick a shard key
  • Cross-shard queries and joins
  • Rebalancing without downtime

Prerequisites

  • Familiar with HTTP and databases

What and Why

A single database server has limits. Eventually you outgrow CPU, memory, or storage. Sharding splits one logical database into many physical ones, each holding a slice of the data. The product talks to a routing layer that hides the split.

Sharding is one of the biggest operational commitments you can make. Once data is split, undoing it is a long road. The choice of sharding strategy and shard key shapes the next several years of your team’s work.

Mental Model

A shard is a self-contained mini-database. A shard key tells you which shard a row lives on. Every read and write has to compute that key. If a query has no shard key, it must hit every shard, which is the slowest and most expensive query you can write.

So the design game is: pick a shard key that most queries can use.

Architecture

Three common strategies:

  • Range: rows whose key falls in [A, M) go to shard 1, [M, Z] to shard 2. Great for time-series data; vulnerable to hot ranges.
  • Hash: apply a hash to the key and take it modulo the shard count. Spreads load evenly; range scans now cross all shards.
  • Directory: maintain a lookup table mapping key to shard. Flexible but adds a hop and a service of record.
App -> Router (knows shard key strategy)
          |
 +--------+--------+
 v                 v
Shard 1            Shard 2
[A-M)              [M-Z]
Query routing across shards

Trade-offs

Range sharding makes time-series queries fast but tends to concentrate writes on the newest shard. Pre-splitting and rolling new shards helps.

Hash sharding solves hotspotting but breaks ordered reads. If your queries are mostly point lookups, hash is fine.

Directory sharding gives you the most flexibility (move a tenant to a new shard, isolate noisy tenants) but the directory becomes a critical service. Cache it aggressively.

Cross-shard joins are expensive in every strategy. Denormalize on write or move the join into the application.

Practical Tips

Pick a shard key that matches your dominant access pattern. For a SaaS product, the tenant id is usually right. For a social network, the user id. For analytics, time.

Plan for rebalancing. Consistent hashing or virtual shards keep most data in place when you add a node. Naive mod N requires moving most of your data when N changes.

Watch for hot shards. A single noisy tenant on a tenant-sharded system can crush a node. Have a runbook to split that tenant into its own shards.

Keep a “shard map” service or config and rev it carefully. The day you ship a bad shard map is the day every query goes to the wrong place.

Wrap-up

Sharding solves real scale problems but introduces real operational complexity. Choose your shard key with care, pick the strategy that matches your access patterns, and invest early in tooling for rebalancing and observability. Done right, a sharded system can scale almost linearly. Done wrong, it becomes a permanent tax.