Node Cluster Mode for Multi-Core Scaling

Intermediate 9 min read

What you'll learn

✓Why a single Node process leaves cores idle
✓How the cluster module forks workers
✓Sharing TCP sockets between processes
✓Graceful restarts for zero-downtime deploys
✓When to reach for PM2 or container orchestration instead

Prerequisites

•Comfortable with HTML and JavaScript

What and Why

Node.js runs your code on a single thread. A modern server has eight, sixteen, or more cores. Without help, your application uses only one of them and the rest sit idle. The built-in cluster module forks one worker per core and lets them share a listening socket. The result is near-linear scaling for CPU-bound work without changing your request handlers.

Mental Model

Think of cluster mode as a small router. The primary process opens the listening port, then forks workers. Each worker calls server.listen on the same port. The operating system, with a little help from libuv, distributes incoming connections across the workers. Each request runs in a fresh single-threaded event loop, so a slow handler in one worker does not block the others.

              Primary process
                  |
      --------------------------
      |           |            |
   Worker 1    Worker 2    Worker N
      |           |            |
   event loop  event loop  event loop
      |           |            |
      ---- shared TCP socket ----

Cluster mode connection flow

Hands-on Example

A minimal HTTP server that scales across cores.

// server.js
import cluster from 'node:cluster';
import http from 'node:http';
import os from 'node:os';
import process from 'node:process';

const cpus = os.availableParallelism();

if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} starting ${cpus} workers`);
  for (let i = 0; i < cpus; i++) cluster.fork();

  cluster.on('exit', (worker, code) => {
    console.warn(`Worker ${worker.process.pid} died (${code}), restarting`);
    cluster.fork();
  });
} else {
  http
    .createServer((req, res) => {
      res.end(`Hello from worker ${process.pid}\n`);
    })
    .listen(3000);
  console.log(`Worker ${process.pid} listening on 3000`);
}

Run it with node server.js and the primary spawns one worker per core. Hit the server a few times and you will see the PID change in the response, showing that requests are spread across workers.

For graceful restarts during a deploy, signal workers one by one.

process.on('SIGTERM', () => {
  console.log(`Worker ${process.pid} draining`);
  server.close(() => process.exit(0));
});

if (cluster.isPrimary) {
  process.on('SIGUSR2', async () => {
    for (const id in cluster.workers) {
      const worker = cluster.workers[id];
      worker.disconnect();
      await new Promise(r => worker.on('exit', r));
      cluster.fork();
    }
  });
}

Send SIGUSR2 to the primary after a deploy and each worker is replaced without dropping requests.

Common Pitfalls

Shared in-memory state does not survive forking. Each worker has its own heap, so an in-memory cache becomes N caches. Use Redis or another shared store for anything that must be consistent across workers. The same warning applies to in-process counters and rate limiters.

Sticky sessions matter for WebSockets. The default round-robin scheduling sends every connection to a different worker, which breaks long-lived connections that expect to stick to one process. Either pin connections at a load balancer or use the SCHED_NONE policy and run your own routing.

Logs become tangled. Without per-worker prefixes, multiple processes write interleaved lines to stdout. Always include process.pid in your log format, or use a structured logger with worker metadata.

Practical Tips

Measure first. Cluster mode helps CPU-bound workloads. If you are stuck on a slow database, more workers will not help and may even hurt by adding connections to a saturated pool.

Set worker memory limits with --max-old-space-size so a leaking worker is killed and replaced rather than dragging the whole machine down. The primary will fork a fresh worker automatically.

Container platforms like Kubernetes prefer one process per container. In that world, scaling means more pods, not more workers. Pick one model and stick to it to avoid double-multiplying your CPU usage.

PM2 wraps the cluster module with friendlier ergonomics, restart policies, and log management. For most production deploys it is worth using PM2 or a similar supervisor rather than rolling your own.

Wrap-up

The cluster module turns a single-threaded server into a multi-process one with surprisingly little code. The pattern shines on bare metal and virtual machines where you control the whole box. Remember that shared state needs an external store, that graceful restarts must be wired up explicitly, and that container platforms often replace the need for cluster mode entirely.