System Design: Newsfeed Architecture (Fanout, Ranking, Caching)

Intermediate 11 min read

What you'll learn

✓Fanout-on-write vs fanout-on-read trade-offs
✓Why hybrid models exist for high-follower users
✓How ranking sits on top of retrieval
✓How caching layers stack in a feed system
✓Where backpressure and write amplification bite you

Prerequisites

•Familiar with how APIs work
•Basic distributed systems

What and Why

A newsfeed is the home screen of any social product: a personalized, time-or-rank ordered list of posts from accounts a user follows. Twitter (X), Instagram, LinkedIn, and TikTok all run variants of the same problem with different ranking philosophies. The interesting engineering is not in displaying the list. It is in producing it cheaply for hundreds of millions of users with sub-second latency on cold caches.

Mental Model

A feed has three stages: retrieval, ranking, and delivery.

Retrieval: gather candidate posts the user might want to see.
Ranking: order them by some objective (recency, predicted engagement, ad slots).
Delivery: serve to clients with pagination and freshness signals.

The hardest stage is retrieval, and the central question is: do we materialize each user’s feed when posts are written (push) or when the feed is read (pull)?

Architecture

Post Write -> Fanout Worker
               |
      +--------+---------+
      |                  |
follower < N        celebrity (N+ followers)
      |                  |
      v                  v
 Push to follower    Skip fanout
 inboxes (Redis)     (pull at read time)
                          |
Read: user pulls inbox + pulls latest from celebrities they follow
      |
      v
   Ranking model -> Top N -> Client

Hybrid fanout feed pipeline

The components:

Inbox store. A per-user list (Redis list or sorted set) of recent post IDs. Bounded to a few hundred entries. Trimmed on every insert.
Post store. The canonical posts in a wide-column store or document DB. Looked up by ID after retrieval.
Fanout service. On a new post, looks up the author’s follower list and pushes the post ID into each follower’s inbox. This is the write amplification.
Pull path for celebrities. A user with 50 million followers cannot be fanned out cheaply. Their posts stay in their own outbox; at read time, the feed service merges the user’s inbox with the latest from each celebrity they follow.
Ranking service. A model (often a gradient-boosted tree or a small transformer) scores candidates by predicted engagement, freshness, and policy signals (diversity, ads, safety).
Edge cache. CDN or app cache holds the last-served feed page so back-button and quick refresh are free.

Trade-offs

The core decision is fanout-on-write versus fanout-on-read.

Fanout-on-write (push). Cheap reads, expensive writes. Great when the follower distribution is flat. A post with 200 followers becomes 200 small writes; the reader does one lookup. Twitter’s original design.
Fanout-on-read (pull). Cheap writes, expensive reads. The reader queries every account they follow and merges. Falls apart at hundreds of follows because the read does N queries.
Hybrid. Push for normal users, pull for celebrities. Picks a follower-count threshold (typically 100k to 1M). This is what every large social network actually runs.

Other trade-offs:

Inbox bound. If you store only the last 500 post IDs per user, late-arriving fanout writes for a user offline for two weeks may overflow. Decide whether the inbox is the source of truth or a cache.
Ranking latency. A neural ranker over 500 candidates with embedding lookups can take 100ms+. Pre-compute features in batches and keep models small for the realtime path.
Consistency. Two devices may see the feed in slightly different orders due to inbox replication lag. Acceptable for social; not for finance.
Backfill cost. A new follow should retroactively include some history; otherwise the feed feels empty. This is a separate batch process.

Practical Tips

Store post IDs in inboxes, not posts. Decouples inbox storage from post mutations like edits and deletions.
Use sorted sets for ordering. Redis ZADD with a score lets you trim, paginate, and merge multiple inboxes efficiently.
Async fanout with a queue. The post write returns immediately; a Kafka consumer handles the fanout. A 30-second delay on feed appearance is acceptable; a 10-second post latency is not.
Separate ranking from retrieval. Retrieve more candidates than you need (say, 500 for a 50-post page). Ranking gives you headroom to apply business rules without re-querying.
Cache the rendered feed page. The first page of the feed is the same for many quick refreshes. Cache it for 30 seconds keyed by (user, version).
Plan for delete and edit. A deleted post in 50 million inboxes is a lot of cleanup. Use a tombstone marker the read path filters out.
Instrument fanout fan-in. Measure the p99 of “post write to visible in inbox” per follower count bucket. This is where outages hide.

# Simplified hybrid retrieval
def get_feed(user_id):
    inbox_ids = redis.zrevrange(f"inbox:{user_id}", 0, 499)
    celeb_ids = []
    for celeb in following_celebs(user_id):
        celeb_ids += redis.zrevrange(f"outbox:{celeb}", 0, 49)
    candidates = fetch_posts(set(inbox_ids + celeb_ids))
    return rank(user_id, candidates)[:50]

Wrap-up

A newsfeed is a retrieval-plus-ranking system with brutal asymmetries in follower count. Pure push and pure pull are both wrong; production systems are hybrid by necessity. Decide where you draw the celebrity threshold, design the inbox as a bounded cache rather than a database, and let ranking absorb the messiness. The goal is a feed page that loads in 200ms with five-nines availability while quietly handling a million posts per second behind the scenes.