Skip to content
C Codeloom
System Design

System Design: Building a Scalable Chat Application

Design a real-time chat system that supports millions of users with low latency messaging, presence, and message persistence at scale.

·3 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • Connection model for real-time chat
  • How to fan-out messages
  • Storage choices for chat history
  • Presence and typing indicators
  • Scaling WebSockets horizontally

Prerequisites

  • Familiar with HTTP and databases

What and Why

Chat is one of the most requested system design questions because it touches real-time communication, eventual consistency, persistence, and fan-out. A good chat system must deliver messages in under a second, survive server restarts, and let users read history from any device.

We will focus on a one-to-one and small-group chat. Massive group chats (10k+ members) add fan-out problems we will only briefly touch on.

Mental Model

Think of chat as three loops:

  1. A long-lived connection so the server can push messages instantly.
  2. A durable log so messages survive crashes and reconnects.
  3. A routing layer that knows which server holds which user’s connection.

If you keep these three concerns separate, scaling becomes a matter of growing each piece independently.

Architecture

Clients connect to a gateway over WebSocket. The gateway authenticates the user, registers the connection in a routing table (often Redis), and forwards inbound messages to a message service. The message service writes to a durable store and publishes to a pub/sub topic so other gateways can deliver the message to recipients.

Client A -> Gateway 1 -> Message Service -> DB (append)
                              |
                              v
                          Pub/Sub
                              |
                  +-----------+-----------+
                  v                       v
              Gateway 2               Gateway 3
                  |                       |
              Client B                Client C
Chat message flow

For storage, a wide-column store like Cassandra or DynamoDB works well because each conversation is an append-only log keyed by conversation_id and timestamp. Reads are range scans on the partition key.

Presence (“online” indicators) lives in Redis with TTLs that refresh on heartbeat. Typing indicators are ephemeral and skip persistence entirely.

Trade-offs

WebSockets give low latency but make horizontal scaling tricky because connections are sticky. Long polling is simpler but burns CPU. Server-sent events work for one-way streams but not bidirectional chat.

Choosing Cassandra trades flexible querying for predictable write throughput. If you need complex filters, you would layer search on top using Elasticsearch.

Strict ordering across servers is expensive. Most chat systems accept per-conversation ordering only, using a server-assigned timestamp or sequence number.

Practical Tips

Use a client_message_id for idempotency so retries do not create duplicates. Acknowledge messages in two stages: “received by server” and “delivered to recipient.” This makes the UI feel responsive while keeping correctness.

Keep group fan-out under control by using a “fan-out on read” strategy for large groups: instead of pushing to every member, store once and let clients pull. Small groups can use “fan-out on write” for snappier delivery.

Compress payloads with permessage-deflate, but watch CPU. Cap message size to a few KB and force media to go through a separate upload endpoint that returns a URL.

Use a single connection per device, not per tab. A shared worker on the web client can multiplex many tabs through one socket.

Wrap-up

A chat system is a fun problem because it forces you to think about pushing data, not pulling it. Start with three pillars: persistent gateways, durable storage, and pub/sub routing. Scale each pillar separately and the rest of the design tends to fall into place.

In an interview, lead with the data model, then the connection layer, then fan-out. That order proves you understand the constraints before you reach for fancy tech.