Caching Strategies in System Design: A Practical Guide
Compare cache-aside, read-through, write-through, write-behind, and refresh-ahead. Learn when each strategy fits, what consistency you give up, and how to choose for interviews.
What you'll learn
- ✓Tell apart cache-aside, read-through, write-through, write-behind, and refresh-ahead
- ✓Reason about consistency, latency, and durability trade-offs
- ✓Pick the right eviction policy for your workload
- ✓Avoid the most common caching pitfalls in interviews
- ✓Sketch a caching layer for a typical web service
Prerequisites
- •Basic familiarity with web services and databases
Caching is the single highest-leverage optimization in most web systems. A cache hit on a popular item can be a hundred times faster than the underlying database query. But caching introduces a second source of truth, and the strategies for keeping the cache and the database aligned are where engineers earn their stripes. This guide walks the canonical strategies and when each fits.
Why Cache at All
The motivation is well-known but worth restating. Memory access is roughly one hundred nanoseconds; a remote SSD-backed database query is roughly one to ten milliseconds. That is a four- to five-order-of-magnitude gap. If 80 percent of requests can be served from a process-local or memory-tier cache, you cut average latency dramatically and reduce load on the slower tier.
Caches also smooth bursty workloads. The classic “thundering herd” of users all hitting the same hot product page on a launch day can be served almost entirely from cache.
Cache-Aside (Lazy Loading)
This is by far the most common pattern. The application code is responsible for managing the cache.
on read(key):
value = cache.get(key)
if value is None:
value = db.get(key)
cache.set(key, value, ttl)
return value
on write(key, value):
db.put(key, value)
cache.delete(key)
The cache only stores items that have been requested; cold items never enter the cache. On a write, you invalidate the cache so the next reader pulls fresh data from the database.
Cache-aside is simple, robust to cache outages (the application falls back to the database), and gives you full control. The downsides are extra round trips on misses and a window of stale reads if the cache and database updates are not atomic.
Read-Through
Read-through pushes the cache-fill logic into the cache itself (or a thin layer in front of it). On a miss, the cache fetches from the database and populates itself before returning.
The application reads only from the cache. This simplifies application code and centralizes the policy, but it means your cache must know how to talk to the database. Many client libraries support this via a “loader” callback.
Read-through is great when many services share the same backing data and you want a single source of cache logic.
Write-Through
In write-through, every write goes to the cache, which synchronously writes to the database before acknowledging.
on write(key, value):
cache.set(key, value)
db.put(key, value) # synchronous, in the same request
ack()
The cache and database stay in lockstep. Reads are simpler because the cache is always authoritative for written keys. The cost is write latency: every write pays for both layers.
Write-through pairs naturally with read-through and is the default model in many distributed caches.
Write-Behind (Write-Back)
Write-behind acknowledges the write as soon as the cache accepts it, then asynchronously flushes to the database in batches.
on write(key, value):
cache.set(key, value)
queue.enqueue((key, value))
ack()
background:
batch = queue.drain(up_to=1000)
db.bulk_put(batch)
Writes feel instant, and you can coalesce many updates to the same key into one database operation. The catch is durability: if the cache process crashes before the queue is drained, those writes are lost.
Use write-behind for high-throughput, loss-tolerant workloads like analytics counters or recently-viewed lists. Never use it for financial transactions.
Refresh-Ahead
Refresh-ahead proactively refreshes hot entries before they expire. The cache tracks access patterns and reloads entries whose TTL is close to expiring while still serving the stale value.
This avoids the latency spike that cache-aside hits whenever a popular key expires. The trade-off is more background work — you may refresh entries that nobody requests again — and slightly stale data for the refresh window.
Refresh-ahead shines for predictable, read-heavy workloads with a small hot set, like home page widgets or top-N leaderboards.
Eviction Policies
When a cache fills up, it must evict. The common policies are:
- LRU (Least Recently Used): evict the entry not touched for the longest. Great default.
- LFU (Least Frequently Used): evict the entry with the lowest hit count. Good when frequency is more stable than recency.
- FIFO: evict the oldest inserted entry. Cheap to implement but rarely optimal.
- TTL-only: every entry expires after a fixed time. Simple, predictable.
- Random: evict a random entry. Surprisingly effective and lock-free.
Most real systems combine TTL with LRU: entries expire after a TTL or when memory pressure forces eviction.
The Hardest Part: Invalidation
Phil Karlton’s quip — “there are only two hard things in computer science: cache invalidation and naming things” — still holds. The patterns above each give you a different shape of inconsistency. A few rules help:
- Treat the database as the source of truth. The cache is an optimization, not a record.
- Pair every write with explicit invalidation (cache-aside) or write-through.
- Use short TTLs as a safety net. Even if you forget to invalidate somewhere, the cache will heal itself.
- Be wary of dogpile (thundering herd) on expiry: when many requests miss the same hot key at once, only one should refill while the others wait. Use a per-key lock or single-flight pattern.
Choosing for an Interview
When the interviewer asks “how would you cache this?”, a strong answer covers:
- Read versus write ratio. Caching pays off most on read-heavy workloads.
- Tolerance for stale reads. Strong consistency rules out write-behind.
- Item size and cardinality. A million tiny items behave differently than a thousand huge documents.
- Eviction and TTL choices, and what happens on a cache miss.
- Failure modes: what happens when the cache goes down.
If you cover these five points, you will sound like an engineer who has run caches in production, not just read about them.
Wrapping Up
Caching is half algorithm and half discipline. Pick the simplest strategy that meets your consistency requirements, default to cache-aside with LRU and a sensible TTL, and only escalate to write-through or refresh-ahead when you can articulate why. The strategies in this guide give you a vocabulary; using it well is mostly about being honest about the failure modes you can live with.
Related articles
- System Design Consistent Hashing Explained for Engineers Who Operate Real Systems
How consistent hashing actually works in production: virtual nodes, rebalancing, hot keys, and why naive modulo hashing fails at scale.
- System Design System Design: Design a Distributed Cache (Redis-like)
Design a distributed in-memory cache like Redis or Memcached. Covers consistent hashing, replication, eviction, persistence, and surviving node failures cleanly.
- CI/CD CI/CD Pipeline Caching Techniques
Speed up CI builds with dependency caches, layer caches, remote build caches, and content-addressed storage. Learn what to cache and what to skip.
- Django Django Caching Strategies
Compare per-view, template fragment, low-level, and per-site caching in Django and learn when each pays off.