Database Sharding Explained With Real Examples

Intermediate 9 min read

What you'll learn

✓What sharding solves and what it does not
✓How to pick a shard key
✓The difference between range, hash, and directory sharding
✓How cross-shard queries and transactions break down
✓When to delay sharding and what to try first

Prerequisites

•Basic relational concepts: see [SQL SELECT Basics](/blog/sql-select-basics)
•Familiarity with indexes: see [SQL Indexes and Performance](/blog/sql-indexes-and-performance)

Sharding means splitting a single logical database into multiple physical databases, where each shard holds a slice of the data. It is the answer to a specific problem: one machine cannot hold or serve all of the data. Used too early, it is the source of most of your pain.

Why one machine eventually is not enough

A single Postgres or MySQL box can comfortably handle terabytes of data and tens of thousands of queries per second if tuned well. You hit limits when:

The working set no longer fits in RAM and disk reads dominate.
Writes saturate one disk or one WAL.
A single table has billions of rows and index maintenance hurts.
A regional team needs data near them for latency.

Before you shard, exhaust the cheaper options: read replicas, better indexes (see SQL Indexes and Performance), connection pooling, caching, partitioning, and vertical scaling. Sharding is the last lever.

The shard key is the whole game

A shard key is the column or set of columns the system uses to decide which shard a row lives on. Pick well and most queries touch one shard. Pick badly and every query fans out to all shards.

Good shard keys are:

Present in almost every query you care about.
High cardinality, so data spreads evenly.
Stable, so rows do not move shards over time.
Aligned with your tenancy model, often tenant_id or user_id.

Bad shard keys include timestamps (all writes hit the newest shard), low-cardinality enums (a few hot shards), and anything you sometimes need to change.

Three common strategies

Range sharding

Split the keyspace into contiguous ranges. Shard A holds users 0 to 1M, shard B holds 1M to 2M, and so on.

Pros: range scans are cheap, easy to reason about.
Cons: hot ranges. If new users are most active, the newest shard burns.

Hash sharding

Hash the shard key and assign by the hash. Even distribution, no hot range.

Pros: balanced load, simple to grow with consistent hashing.
Cons: range scans become cross-shard. Sequential IDs no longer cluster.

Directory or lookup sharding

Maintain a small table that maps shard key to shard. A request looks up the shard, then routes.

Pros: flexible, supports moving tenants between shards.
Cons: the directory becomes a critical service. Cache it aggressively.

Most large multi-tenant systems end up with directory sharding by tenant. It lets you move noisy neighbors to dedicated shards without a migration.

What gets harder

Sharding breaks the comfortable assumption that you can join, group, or transact across all of your data.

Cross-shard joins

A query like:

SELECT u.id, count(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id;

is trivial in one database and a project in a sharded one. Solutions: denormalize, materialize aggregates per shard, or push the JOIN out to a separate analytics warehouse. Revisit SQL Joins and ask which of yours actually require global visibility.

Cross-shard transactions

ACID across shards needs two-phase commit, which is slow and operationally fragile. Most systems avoid it. Instead, design for single-shard transactions and use the outbox pattern plus idempotent consumers for cross-shard side effects.

Unique constraints

A UNIQUE(email) constraint cannot be enforced locally if users are sharded by user_id. Options: a separate uniqueness service, a global secondary index, or pre-check with eventual conflict resolution.

Re-sharding

You will eventually need to split a shard. Plan for it from day one. Consistent hashing or directory sharding makes this tolerable. Hard-coded modulo sharding makes this a multi-quarter project.

A concrete example

Imagine a SaaS app with 50,000 tenants. You decide to shard by tenant_id with directory sharding.

type ShardId = "shard-a" | "shard-b" | "shard-c";

async function getShardFor(tenantId: string): Promise<ShardId> {
  const cached = lru.get(tenantId);
  if (cached) return cached;
  const row = await directory.lookup(tenantId);
  lru.set(tenantId, row.shardId);
  return row.shardId;
}

async function loadInvoice(tenantId: string, invoiceId: string) {
  const shard = await getShardFor(tenantId);
  return pool(shard).query(
    "SELECT * FROM invoices WHERE tenant_id = $1 AND id = $2",
    [tenantId, invoiceId]
  );
}

Now most reads and writes route to a single shard. When a tenant grows to 30 percent of a shard, you migrate them to a dedicated shard and update the directory. Existing connections drain and reconnect.

For admin reports, you build a streaming pipeline that fans out per shard and aggregates results in the app, or you ship change events into a warehouse.

Operational realities

Backups are per shard. So is restore. Test it.
Schema migrations must roll across all shards. Use a tool that tracks state per shard.
Monitoring needs per-shard metrics. Averages hide hot shards.
New environments need shard-aware bootstrapping.

The team cost of sharding is permanent. Budget for it.

When not to shard

If your database is under 500 GB, your p99 query is under 50 ms, and your write rate is in the low thousands, you do not need sharding. You need indexes, a read replica, and a cache. Sharding is for when you have already done those things and the numbers still do not work.

Wrap up

Sharding is a powerful tool for genuinely large systems and a tax on everything you do. Delay it as long as honest measurement allows. When you commit, choose a shard key your queries already use, pick a strategy that allows re-sharding, and design every feature with the assumption that data is partitioned. The payoff is that your database stops being the ceiling.