Skip to content
C Codeloom
System Design

System Design: Design a Chat System (WhatsApp/Slack)

Design a real-time chat system like WhatsApp or Slack. Covers WebSockets, presence, message ordering, fan-out, storage, and delivering messages to offline users.

·5 min read · By Yash Kesharwani
Intermediate 12 min read

What you'll learn

  • Choose between long-poll, SSE, and WebSockets
  • Route messages through a stateful gateway
  • Persist and order messages per conversation
  • Deliver to offline users with push notifications
  • Handle group chats and read receipts at scale

Prerequisites

  • Familiarity with TCP and HTTP basics.
  • Comfort with message queues and pub/sub.

Chat looks simple until you draw it. Then you discover stateful gateways, ordering, presence, delivery receipts, offline storage, and push notifications, all of which have to work while the user roams between cell towers. This design is for a one-to-one and small-group chat system in the WhatsApp or Slack mold.

Functional Requirements

  • One-to-one messaging with text and media.
  • Group chats up to roughly 250 members.
  • Online presence and typing indicators.
  • Delivery and read receipts.
  • Offline messages — recipient gets messages sent while they were offline.
  • Push notifications when the app is backgrounded.

Non-Functional Requirements

  • 1B daily active users, 100B messages per day.
  • Average write QPS: about 1.2M, peak 5M.
  • End-to-end delivery latency: p99 under 500 ms for online users.
  • Messages must be durable and never reordered within a conversation.
  • Availability target: 99.99 percent.

High-Level Architecture

  • Client opens a persistent connection (WebSocket) to a chat gateway.
  • Gateway is stateful: it tracks which connection belongs to which user. A session store (Redis) maps user_id to gateway_node_id.
  • Sender pushes message to its gateway. Gateway writes to the message store and publishes onto a pub/sub topic keyed by conversation.
  • Recipient’s gateway is subscribed for its connected users and pushes the message down the WebSocket.
  • If the recipient is offline, a push notification service delivers via APNs or FCM, and the message waits in the offline store until the recipient reconnects.

Data Model

CREATE TABLE conversations (
  conv_id     BIGINT PRIMARY KEY,
  kind        VARCHAR(8),
  created_at  TIMESTAMP
);

CREATE TABLE conversation_members (
  conv_id     BIGINT,
  user_id     BIGINT,
  joined_at   TIMESTAMP,
  PRIMARY KEY (conv_id, user_id)
);

CREATE TABLE messages (
  conv_id     BIGINT,
  msg_id      BIGINT,
  sender_id   BIGINT,
  body        TEXT,
  media_url   TEXT,
  created_at  TIMESTAMP,
  PRIMARY KEY (conv_id, msg_id)
);

Shard the messages table by conv_id. The natural sort order is msg_id issued by a Snowflake-style generator per conversation so ordering is monotonic.

Read state per user:

key:   read:{conv_id}:{user_id}
value: last_read_msg_id

Key APIs

WebSocket frames:

send: { type: "msg", conv_id, body, client_msg_id }
recv: { type: "msg", conv_id, msg_id, sender_id, body, ts }
recv: { type: "ack", client_msg_id, msg_id }
recv: { type: "presence", user_id, state }

REST for history and admin:

GET  /api/v1/conversations/:id/messages?before=<msg_id>&limit=50
POST /api/v1/conversations
POST /api/v1/conversations/:id/read
  body: last_read_msg_id

Scaling and Tradeoffs

Transport. WebSockets give bidirectional push at low overhead. Long polling is a fallback for restrictive networks. Server-Sent Events are one-way only — fine for receiving but you still need an HTTP POST for sending.

Stateful gateways. A gateway holds N connections in memory (think 100k per node). Use a hash of user_id to route, with consistent hashing so adding nodes does not reshuffle everyone. A session store maps user to gateway, refreshed on connect and TTL’d.

Routing messages. The pub/sub layer (Kafka or a Redis pub/sub) is keyed by conversation, not by user. Every gateway with a connected member subscribes. This avoids N square fan-out across the cluster.

Ordering. Per-conversation ordering is the only guarantee you owe. Generate msg_id from a single sequence per conversation. The sequence service can be a small leader-elected counter per shard, or a Snowflake generator scoped to the conversation shard.

Delivery receipts. Sender gets an ack with the assigned msg_id as soon as the message is durable. The recipient’s client sends a delivered receipt on receive and a read receipt on view. These are themselves small messages.

Offline storage. When the recipient is offline, the message is already in the messages table. On reconnect, the client asks for messages with msg_id > last_seen. Push notification is best-effort and triggered by a worker reading the pub/sub topic.

Group chats. Same model. The gateway looks up members, finds which are online, and delivers to those connections. Cap group size to keep fan-out tractable; for huge broadcast groups, use a different broadcast pipeline.

Media. Upload to object storage, share a signed URL in the message body. Never push media bytes through the gateway.

End-to-end encryption. WhatsApp uses the Signal protocol. The server only sees ciphertext. This forces all heavy server-side features (search, content moderation) onto the client.

For deployment patterns of stateful gateways, see What is KubernetesStatefulSet plus a headless service is the usual recipe.

What to Say in an Interview

  • Pick WebSockets and justify stateful gateways. Mention consistent hashing for connection routing.
  • Make per-conversation ordering explicit, and place the sequence generator on the conversation shard.
  • Walk through the online and offline delivery paths separately. Most candidates conflate them.
  • Mention pub/sub keyed by conversation, not by user. This is the scaling unlock.
  • Call out E2E encryption as a constraint that disables server-side features. Even if not asked, this signals real product awareness.

Wrap up

A chat system is a stateful gateway sitting on top of a sharded message log with a pub/sub between them. Get the ordering right per conversation, route by user with consistent hashing, and handle offline users via the same store plus a push worker. Everything else — presence, typing, receipts — is small messages on the same pipe.