Caching Strategies in System Design: A Practical Guide

Beginner 11 min read

What you'll learn

✓Tell apart cache-aside, read-through, write-through, write-behind, and refresh-ahead
✓Reason about consistency, latency, and durability trade-offs
✓Pick the right eviction policy for your workload
✓Avoid the most common caching pitfalls in interviews
✓Sketch a caching layer for a typical web service

Prerequisites

•Basic familiarity with web services and databases

Caching is the single highest-leverage optimization in most web systems. A cache hit on a popular item can be a hundred times faster than the underlying database query. But caching introduces a second source of truth, and the strategies for keeping the cache and the database aligned are where engineers earn their stripes. This guide walks the canonical strategies and when each fits.

Why Cache at All

The motivation is well-known but worth restating. Memory access is roughly one hundred nanoseconds; a remote SSD-backed database query is roughly one to ten milliseconds. That is a four- to five-order-of-magnitude gap. If 80 percent of requests can be served from a process-local or memory-tier cache, you cut average latency dramatically and reduce load on the slower tier.

Caches also smooth bursty workloads. The classic “thundering herd” of users all hitting the same hot product page on a launch day can be served almost entirely from cache.

Cache-Aside (Lazy Loading)

This is by far the most common pattern. The application code is responsible for managing the cache.

on read(key):
  value = cache.get(key)
  if value is None:
    value = db.get(key)
    cache.set(key, value, ttl)
  return value

on write(key, value):
  db.put(key, value)
  cache.delete(key)

The cache only stores items that have been requested; cold items never enter the cache. On a write, you invalidate the cache so the next reader pulls fresh data from the database.

Cache-aside is simple, robust to cache outages (the application falls back to the database), and gives you full control. The downsides are extra round trips on misses and a window of stale reads if the cache and database updates are not atomic.

Read-Through

Read-through pushes the cache-fill logic into the cache itself (or a thin layer in front of it). On a miss, the cache fetches from the database and populates itself before returning.

The application reads only from the cache. This simplifies application code and centralizes the policy, but it means your cache must know how to talk to the database. Many client libraries support this via a “loader” callback.

Read-through is great when many services share the same backing data and you want a single source of cache logic.

Write-Through

In write-through, every write goes to the cache, which synchronously writes to the database before acknowledging.

on write(key, value):
  cache.set(key, value)
  db.put(key, value)  # synchronous, in the same request
  ack()

The cache and database stay in lockstep. Reads are simpler because the cache is always authoritative for written keys. The cost is write latency: every write pays for both layers.

Write-through pairs naturally with read-through and is the default model in many distributed caches.

Write-Behind (Write-Back)

Write-behind acknowledges the write as soon as the cache accepts it, then asynchronously flushes to the database in batches.

on write(key, value):
  cache.set(key, value)
  queue.enqueue((key, value))
  ack()

background:
  batch = queue.drain(up_to=1000)
  db.bulk_put(batch)

Writes feel instant, and you can coalesce many updates to the same key into one database operation. The catch is durability: if the cache process crashes before the queue is drained, those writes are lost.

Use write-behind for high-throughput, loss-tolerant workloads like analytics counters or recently-viewed lists. Never use it for financial transactions.

Refresh-Ahead

Refresh-ahead proactively refreshes hot entries before they expire. The cache tracks access patterns and reloads entries whose TTL is close to expiring while still serving the stale value.

This avoids the latency spike that cache-aside hits whenever a popular key expires. The trade-off is more background work — you may refresh entries that nobody requests again — and slightly stale data for the refresh window.

Refresh-ahead shines for predictable, read-heavy workloads with a small hot set, like home page widgets or top-N leaderboards.

Eviction Policies

When a cache fills up, it must evict. The common policies are:

LRU (Least Recently Used): evict the entry not touched for the longest. Great default.
LFU (Least Frequently Used): evict the entry with the lowest hit count. Good when frequency is more stable than recency.
FIFO: evict the oldest inserted entry. Cheap to implement but rarely optimal.
TTL-only: every entry expires after a fixed time. Simple, predictable.
Random: evict a random entry. Surprisingly effective and lock-free.

Most real systems combine TTL with LRU: entries expire after a TTL or when memory pressure forces eviction.

The Hardest Part: Invalidation

Phil Karlton’s quip — “there are only two hard things in computer science: cache invalidation and naming things” — still holds. The patterns above each give you a different shape of inconsistency. A few rules help:

Treat the database as the source of truth. The cache is an optimization, not a record.
Pair every write with explicit invalidation (cache-aside) or write-through.
Use short TTLs as a safety net. Even if you forget to invalidate somewhere, the cache will heal itself.
Be wary of dogpile (thundering herd) on expiry: when many requests miss the same hot key at once, only one should refill while the others wait. Use a per-key lock or single-flight pattern.

Choosing for an Interview

When the interviewer asks “how would you cache this?”, a strong answer covers:

Read versus write ratio. Caching pays off most on read-heavy workloads.
Tolerance for stale reads. Strong consistency rules out write-behind.
Item size and cardinality. A million tiny items behave differently than a thousand huge documents.
Eviction and TTL choices, and what happens on a cache miss.
Failure modes: what happens when the cache goes down.

If you cover these five points, you will sound like an engineer who has run caches in production, not just read about them.

Wrapping Up

Caching is half algorithm and half discipline. Pick the simplest strategy that meets your consistency requirements, default to cache-aside with LRU and a sensible TTL, and only escalate to write-through or refresh-ahead when you can articulate why. The strategies in this guide give you a vocabulary; using it well is mostly about being honest about the failure modes you can live with.