Database Sharding Strategies Explained
Compare range, hash, and directory-based sharding strategies, with guidance on choosing shard keys and operating sharded systems.
What you'll learn
- ✓Why sharding becomes necessary
- ✓Range vs hash vs directory sharding
- ✓How to pick a shard key
- ✓Cross-shard queries and joins
- ✓Rebalancing without downtime
Prerequisites
- •Familiar with HTTP and databases
What and Why
A single database server has limits. Eventually you outgrow CPU, memory, or storage. Sharding splits one logical database into many physical ones, each holding a slice of the data. The product talks to a routing layer that hides the split.
Sharding is one of the biggest operational commitments you can make. Once data is split, undoing it is a long road. The choice of sharding strategy and shard key shapes the next several years of your team’s work.
Mental Model
A shard is a self-contained mini-database. A shard key tells you which shard a row lives on. Every read and write has to compute that key. If a query has no shard key, it must hit every shard, which is the slowest and most expensive query you can write.
So the design game is: pick a shard key that most queries can use.
Architecture
Three common strategies:
- Range: rows whose key falls in
[A, M)go to shard 1,[M, Z]to shard 2. Great for time-series data; vulnerable to hot ranges. - Hash: apply a hash to the key and take it modulo the shard count. Spreads load evenly; range scans now cross all shards.
- Directory: maintain a lookup table mapping key to shard. Flexible but adds a hop and a service of record.
App -> Router (knows shard key strategy)
|
+--------+--------+
v v
Shard 1 Shard 2
[A-M) [M-Z] Trade-offs
Range sharding makes time-series queries fast but tends to concentrate writes on the newest shard. Pre-splitting and rolling new shards helps.
Hash sharding solves hotspotting but breaks ordered reads. If your queries are mostly point lookups, hash is fine.
Directory sharding gives you the most flexibility (move a tenant to a new shard, isolate noisy tenants) but the directory becomes a critical service. Cache it aggressively.
Cross-shard joins are expensive in every strategy. Denormalize on write or move the join into the application.
Practical Tips
Pick a shard key that matches your dominant access pattern. For a SaaS product, the tenant id is usually right. For a social network, the user id. For analytics, time.
Plan for rebalancing. Consistent hashing or virtual shards keep most data in place when you add a node. Naive mod N requires moving most of your data when N changes.
Watch for hot shards. A single noisy tenant on a tenant-sharded system can crush a node. Have a runbook to split that tenant into its own shards.
Keep a “shard map” service or config and rev it carefully. The day you ship a bad shard map is the day every query goes to the wrong place.
Wrap-up
Sharding solves real scale problems but introduces real operational complexity. Choose your shard key with care, pick the strategy that matches your access patterns, and invest early in tooling for rebalancing and observability. Done right, a sharded system can scale almost linearly. Done wrong, it becomes a permanent tax.
Related articles
- Backend Soft Deletes vs Hard Deletes: A Practical Guide
Compare soft and hard deletes across recovery, compliance, indexing, and query complexity to pick the right deletion strategy.
- System Design Designing Rate Limiters: A System Design Deep Dive
A senior-engineer guide to designing rate limiters: algorithms, distributed coordination, trade-offs, and production patterns that actually scale.
- System Design Database Sharding Explained: Keys, Strategies, and Trade-offs
A practical introduction to sharding: range, hash, directory, and geo-based partitioning. Learn how to pick a shard key, handle hot shards, and plan resharding without downtime.
- Backend The Circuit Breaker Pattern Explained
Use circuit breakers to stop cascading failures, with state transitions, timeouts, and tuning advice for production microservices.