System Design Interview: A 7-Step Framework
A repeatable 7-step framework for system design interviews. Clarify, estimate, API, data model, high-level, deep dive, bottlenecks — with concrete examples for each step.
What you'll learn
- ✓The 7 steps and what to deliver in each
- ✓How to do back-of-envelope estimation without panic
- ✓What an API contract looks like at a whiteboard
- ✓How to decide what to deep-dive into
- ✓How to surface bottlenecks before the interviewer asks
Prerequisites
- •A first look at distributed systems concepts
System design interviews intimidate candidates because they feel open-ended. They are not — the structure is just hidden. Top performers all follow roughly the same script. Internalize this 7-step framework and the room stops feeling like an interrogation and starts feeling like a design review.
Step 1: Clarify
Spend the first 3-5 minutes asking questions. Never start drawing boxes immediately. Your goals:
- Pin down the functional requirements (what the system does).
- Pin down the non-functional requirements (scale, latency, availability, consistency).
- Identify the out-of-scope items so you do not get pulled in five directions.
For “design Twitter”:
- “Are we focused on the timeline read path, the posting write path, or both?”
- “What scale are we targeting — 10 million DAU or 500 million?”
- “Do we need search? Direct messages? Trends?”
- “What is the tolerable latency on the home timeline?”
If you skip this step, the interviewer will pull you back to it in the worst possible way — by telling you halfway through that you have been solving the wrong problem.
Step 2: Estimate
Back-of-envelope numbers anchor every design decision that follows. You only need three numbers: reads per second, writes per second, and storage growth per day.
For Twitter at 200M DAU:
Daily active users: 200,000,000
Avg tweets per user/day: 2
Total tweets/day: 400,000,000
Tweets per second avg: ~4,600
Read-to-write ratio: ~100:1
Reads per second avg: ~460,000
Avg tweet size: ~300 bytes
Storage per day: ~120 GB
Round generously. Nobody is grading you on precision — they are grading you on whether you remember that reads dwarf writes and storage adds up.
Step 3: API design
Define the public contract before drawing internals. A few endpoints is enough:
POST /tweets body: content -> tweet_id
GET /tweets/{id} -> tweet
GET /users/{id}/timeline?cursor=... -> tweets[]
POST /follow body: target_user_id -> ok
This is where you sneak in pagination (cursor, not offset), authentication (assume a bearer token), and idempotency (POST with a client-supplied ID). Small touches signal seniority.
Step 4: Data model
Sketch the core tables and their access patterns. For Twitter:
users(user_id, handle, name, created_at)
tweets(tweet_id, user_id, content, created_at)
follows(follower_id, followee_id, created_at)
timeline_cache(user_id, tweet_id, score)
State the storage choice for each: users in Postgres, tweets in a sharded relational store or Cassandra, the timeline cache in Redis. Mention why — “tweets are write-heavy and we shard by user_id to keep hotspots manageable.”
Step 5: High-level architecture
Draw the boxes. Aim for 6-10 components, no more. A reasonable Twitter sketch:
- Load balancer
- API gateway
- Tweet service
- Timeline service
- Follow graph service
- Cache (Redis)
- Message queue (Kafka)
- Storage (Postgres + Cassandra)
- Fan-out workers
Trace one user flow end to end: “User posts a tweet. The tweet service writes to storage and publishes to Kafka. Fan-out workers consume the event and push the tweet into each follower’s timeline cache.” Then trace a second flow: “User opens the app. The timeline service reads from Redis. On a miss, it falls back to a database query and warms the cache.”
Two complete flows is the sweet spot. More is noise.
Step 6: Deep dive
The interviewer will pick one component and ask you to go deeper. Common targets:
- The timeline cache: push vs pull vs hybrid for celebrities with millions of followers.
- The follow graph: how to scale to billions of edges, partitioning strategy.
- The tweet store: sharding key, secondary indexes, hotspot mitigation.
Be ready with a real opinion. For the timeline: hybrid is the right answer. Use push fan-out for users with under ~10k followers, pull for celebrities, merge at read time. State the trade-off out loud — push optimizes reads, pull optimizes writes, hybrid balances both at the cost of complexity.
Show what you would build, what you would avoid, and why. Vague answers (“we could use a queue”) read as junior. Specific answers (“Kafka with a partition per shard, 7-day retention, exactly-once semantics via idempotent producers”) read as senior.
Step 7: Bottlenecks and trade-offs
End the interview by surfacing the weak points before they are pointed out. For Twitter:
- Celebrity fan-out is the obvious choke point. Mitigation: hybrid timeline strategy.
- Hot tweets create read hotspots. Mitigation: a tweet-level cache with TTL and request coalescing.
- The follow graph is huge. Mitigation: shard by user_id, replicate for read traffic.
- Eventual consistency on timelines is acceptable. State that you are accepting it and why.
Acknowledging trade-offs is not weakness — it is the defining mark of a senior engineer. Designs that pretend they have no weaknesses get marked down because the interviewer knows the weaknesses exist.
Time budget for a 45-minute interview
Clarify: 5 min
Estimate: 3 min
API: 4 min
Data model: 5 min
High-level: 10 min
Deep dive: 12 min
Bottlenecks: 5 min
Buffer: 1 min
Adjust as needed but keep the order. Out-of-order designs feel chaotic to the interviewer even when the technical content is correct.
Practice without a partner
Pick a problem, set a 45-minute timer, and run the whole framework solo. Record yourself talking through it. Re-watch the next day — you will be embarrassed, and that embarrassment is the fastest path to improvement.
The framework is not the answer to system design. It is the scaffolding that lets your knowledge come out under pressure. Drill it until it is automatic, then your real skill — judgment about distributed systems — gets to shine.
Related articles
- System Design Event-Driven Architecture: The Pragmatic Introduction
What event-driven architecture really gives you, when to choose it, and the operational realities of running asynchronous systems at scale.
- System Design Message Queues: Kafka vs RabbitMQ (When to Pick Which)
A senior-engineer comparison of Kafka and RabbitMQ: log vs queue semantics, throughput, ordering, retention, and the real selection criteria.
- Career FAANG System Design Interview Checklist (Senior Edition)
A senior-engineer checklist for system design interviews: how to drive the discussion, allocate time, surface trade-offs, and avoid common traps.
- Career How to Grind LeetCode Effectively: Patterns Over Volume
Stop doing 500 random problems. Use spaced repetition, study patterns instead of volume, and run weekly mock interviews to make every LeetCode hour count.