System Design Interview: A 7-Step Framework

Intermediate 11 min read

What you'll learn

✓The 7 steps and what to deliver in each
✓How to do back-of-envelope estimation without panic
✓What an API contract looks like at a whiteboard
✓How to decide what to deep-dive into
✓How to surface bottlenecks before the interviewer asks

Prerequisites

•A first look at distributed systems concepts

System design interviews intimidate candidates because they feel open-ended. They are not — the structure is just hidden. Top performers all follow roughly the same script. Internalize this 7-step framework and the room stops feeling like an interrogation and starts feeling like a design review.

Step 1: Clarify

Spend the first 3-5 minutes asking questions. Never start drawing boxes immediately. Your goals:

Pin down the functional requirements (what the system does).
Pin down the non-functional requirements (scale, latency, availability, consistency).
Identify the out-of-scope items so you do not get pulled in five directions.

For “design Twitter”:

“Are we focused on the timeline read path, the posting write path, or both?”
“What scale are we targeting — 10 million DAU or 500 million?”
“Do we need search? Direct messages? Trends?”
“What is the tolerable latency on the home timeline?”

If you skip this step, the interviewer will pull you back to it in the worst possible way — by telling you halfway through that you have been solving the wrong problem.

Step 2: Estimate

Back-of-envelope numbers anchor every design decision that follows. You only need three numbers: reads per second, writes per second, and storage growth per day.

For Twitter at 200M DAU:

Daily active users:       200,000,000
Avg tweets per user/day:  2
Total tweets/day:         400,000,000
Tweets per second avg:    ~4,600
Read-to-write ratio:      ~100:1
Reads per second avg:     ~460,000
Avg tweet size:           ~300 bytes
Storage per day:          ~120 GB

Round generously. Nobody is grading you on precision — they are grading you on whether you remember that reads dwarf writes and storage adds up.

Step 3: API design

Define the public contract before drawing internals. A few endpoints is enough:

POST   /tweets         body: content                -> tweet_id
GET    /tweets/{id}                                 -> tweet
GET    /users/{id}/timeline?cursor=...              -> tweets[]
POST   /follow         body: target_user_id         -> ok

This is where you sneak in pagination (cursor, not offset), authentication (assume a bearer token), and idempotency (POST with a client-supplied ID). Small touches signal seniority.

Step 4: Data model

Sketch the core tables and their access patterns. For Twitter:

users(user_id, handle, name, created_at)
tweets(tweet_id, user_id, content, created_at)
follows(follower_id, followee_id, created_at)
timeline_cache(user_id, tweet_id, score)

State the storage choice for each: users in Postgres, tweets in a sharded relational store or Cassandra, the timeline cache in Redis. Mention why — “tweets are write-heavy and we shard by user_id to keep hotspots manageable.”

Step 5: High-level architecture

Draw the boxes. Aim for 6-10 components, no more. A reasonable Twitter sketch:

Load balancer
API gateway
Tweet service
Timeline service
Follow graph service
Cache (Redis)
Message queue (Kafka)
Storage (Postgres + Cassandra)
Fan-out workers

Trace one user flow end to end: “User posts a tweet. The tweet service writes to storage and publishes to Kafka. Fan-out workers consume the event and push the tweet into each follower’s timeline cache.” Then trace a second flow: “User opens the app. The timeline service reads from Redis. On a miss, it falls back to a database query and warms the cache.”

Two complete flows is the sweet spot. More is noise.

Step 6: Deep dive

The interviewer will pick one component and ask you to go deeper. Common targets:

The timeline cache: push vs pull vs hybrid for celebrities with millions of followers.
The follow graph: how to scale to billions of edges, partitioning strategy.
The tweet store: sharding key, secondary indexes, hotspot mitigation.

Be ready with a real opinion. For the timeline: hybrid is the right answer. Use push fan-out for users with under ~10k followers, pull for celebrities, merge at read time. State the trade-off out loud — push optimizes reads, pull optimizes writes, hybrid balances both at the cost of complexity.

Show what you would build, what you would avoid, and why. Vague answers (“we could use a queue”) read as junior. Specific answers (“Kafka with a partition per shard, 7-day retention, exactly-once semantics via idempotent producers”) read as senior.

Step 7: Bottlenecks and trade-offs

End the interview by surfacing the weak points before they are pointed out. For Twitter:

Celebrity fan-out is the obvious choke point. Mitigation: hybrid timeline strategy.
Hot tweets create read hotspots. Mitigation: a tweet-level cache with TTL and request coalescing.
The follow graph is huge. Mitigation: shard by user_id, replicate for read traffic.
Eventual consistency on timelines is acceptable. State that you are accepting it and why.

Acknowledging trade-offs is not weakness — it is the defining mark of a senior engineer. Designs that pretend they have no weaknesses get marked down because the interviewer knows the weaknesses exist.

Time budget for a 45-minute interview

Clarify:           5 min
Estimate:          3 min
API:               4 min
Data model:        5 min
High-level:        10 min
Deep dive:         12 min
Bottlenecks:       5 min
Buffer:            1 min

Adjust as needed but keep the order. Out-of-order designs feel chaotic to the interviewer even when the technical content is correct.

Practice without a partner

Pick a problem, set a 45-minute timer, and run the whole framework solo. Record yourself talking through it. Re-watch the next day — you will be embarrassed, and that embarrassment is the fastest path to improvement.

The framework is not the answer to system design. It is the scaffolding that lets your knowledge come out under pressure. Drill it until it is automatic, then your real skill — judgment about distributed systems — gets to shine.