Message Queues: Kafka vs RabbitMQ (When to Pick Which)

Intermediate 10 min read

What you'll learn

✓How a log differs from a queue semantically
✓Where Kafka beats RabbitMQ and vice versa
✓How partitioning and consumer groups work
✓When ordering and replay actually matter
✓How to pick based on workload, not hype

Prerequisites

•Familiar with how APIs work
•Basic messaging concepts

What and Why

Kafka and RabbitMQ both move messages between producers and consumers, but they are not interchangeable. They were built for different problems and reflect different worldviews. RabbitMQ is a broker that routes messages to consumers and forgets them when acknowledged. Kafka is a durable, partitioned, replayable log that consumers read at their own pace. Picking the wrong one buys you months of fighting your tooling.

Mental Model

RabbitMQ is a post office. A letter arrives, gets routed to a mailbox, the recipient picks it up, and it is gone. Multiple consumers compete for items from the same queue, with each item delivered to one of them.

Kafka is a newspaper archive. Every issue is appended to a numbered shelf. Any subscriber can come read any issue, in order, at their own pace. The archive does not forget; it has a retention window.

This difference cascades into everything: ordering, throughput, consumer model, replay, even how you do failure recovery.

Architecture

RabbitMQ:
Producer -> Exchange -> Queue -> [Consumer1 or Consumer2 or ...]
                                one message, one consumer
                                ack -> removed

Kafka:
Producer -> Topic (partition 0 | partition 1 | partition 2)
                     |
          +----------+-----------+
          |                      |
     Consumer Group A      Consumer Group B
     (offset = 47)         (offset = 12)
     reads independently   reads independently

Kafka log vs RabbitMQ queue model

Kafka topics are partitioned. Each partition is an ordered, immutable log. Consumers in a group split partitions among themselves; consumers in different groups all see every message. Offsets are stored per group, so a new consumer can replay from any point.

RabbitMQ uses exchanges (direct, topic, fanout) to route messages to queues. Queues are competing-consumer by default. Messages are removed on acknowledgment (or after a TTL).

Trade-offs

Throughput. Kafka wins by a wide margin. A single broker can sustain hundreds of thousands of messages per second per partition because it appends sequentially to disk and uses zero-copy delivery. RabbitMQ’s per-queue throughput maxes out at tens of thousands. For analytics, telemetry, and event streaming, Kafka is the right choice.

Routing flexibility. RabbitMQ wins. Topic exchanges with wildcard bindings, headers exchanges, RPC patterns, and dead-letter routing are first-class. Kafka has topics; routing logic lives in the consumer or in stream processors. For complex enterprise messaging patterns, RabbitMQ is closer to the metal.

Ordering. Kafka guarantees order within a partition. RabbitMQ guarantees order within a queue with a single consumer. The moment you scale RabbitMQ horizontally, you give up order. In Kafka, you keep order by partitioning on a stable key (e.g., user ID).

Retention and replay. Kafka stores messages for days or weeks regardless of consumption. You can spin up a new consumer and read history. RabbitMQ deletes acknowledged messages. If you need replay, audit, or late-binding consumers, Kafka is the only realistic option.

Per-message guarantees. RabbitMQ supports per-message TTL, priority queues, and per-message acknowledgment with redelivery. Kafka does not. Kafka’s unit is the offset; you advance or you don’t.

Operational cost. Kafka used to require ZooKeeper, now uses KRaft. Either way, you need a partition rebalance strategy, JVM tuning, and disk planning. RabbitMQ is lighter to run for small workloads but harder to scale past a single node cluster cleanly.

Latency. RabbitMQ has lower end-to-end latency for small workloads (single-digit ms). Kafka batches and buffers; latency is 10-50ms typically but throughput is orders of magnitude higher.

Practical Tips

Pick Kafka for streams, RabbitMQ for tasks. If messages are immutable events that many systems consume, Kafka. If messages are jobs that one worker must do once, RabbitMQ (or a job queue like SQS).
Partition by the key you care about ordering on. If you partition Kafka by user ID, all events for one user are ordered. Don’t partition by random ID and then complain about reordering.
Set Kafka retention deliberately. Default 7 days is rarely right. Compliance and replay needs drive the number. Disk is cheap; rerunning a backfill is not.
Use idempotent producers. Kafka producers can be configured for exactly-once semantics within a transaction. RabbitMQ producers need application-level dedup.
Plan for poison messages. RabbitMQ has DLX (dead-letter exchange) built in. In Kafka, write a sidecar consumer that moves bad messages to a topic.deadletter topic; do not block the main consumer.
Monitor consumer lag. The single most important metric for both systems. Kafka: lag in offsets. RabbitMQ: queue depth. Alert before it hits memory limits.
Don’t use Kafka as a database. Compacted topics are useful for current-state lookups, but Kafka is not built for random reads. Project the stream into a real DB for queries.

A rule of thumb I use in design reviews: if you can answer “how would a new team consume this data six months from now” only by saying “we replay it from Kafka”, you have picked correctly. If the answer is “we add a new RabbitMQ binding”, you have also picked correctly. If the answer is “we re-publish from the source,” you have the wrong tool.

Wrap-up

Kafka and RabbitMQ solve overlapping but distinct problems. Kafka is a log; RabbitMQ is a broker. Pick Kafka when you need replay, high throughput, and ordered streams. Pick RabbitMQ when you need flexible routing, per-message control, and competing-consumer task distribution. Some teams run both, and that is fine. The mistake is picking based on what is fashionable rather than what your workload needs. Read your traffic shape first, then choose the tool whose model matches it.