Node Worker Threads vs Cluster

Intermediate 11 min read

What you'll learn

✓How worker_threads and cluster differ at the OS level
✓When CPU-bound work needs threads
✓When to scale a server with cluster
✓How shared memory works in workers
✓Pitfalls around memory and IPC

Prerequisites

•Basic Node and event loop knowledge

Node is single-threaded for your JavaScript, but the runtime ships two ways to use more than one core: worker_threads and cluster. They sound similar and are constantly confused, but they solve different problems. Pick the wrong one and you either over-engineer a web server or starve a CPU-bound job.

The short answer

cluster forks the whole Node process. Each worker is an OS process with its own memory, its own event loop, and its own V8 instance. You typically use it to scale an HTTP server across cores.

worker_threads runs JavaScript in real OS threads inside the same process. Workers share the process but each has its own event loop and isolated V8 isolate. You typically use it to offload CPU-bound work without blocking your main loop.

Mental model

cluster (multi-process):
[primary] --fork--> [proc A: server :3000]
               \--> [proc B: server :3000]
               \--> [proc C: server :3000]
shared OS socket, separate memory

worker_threads (multi-thread, one process):
[main thread] <--postMessage--> [worker thread 1]
              <--postMessage--> [worker thread 2]
shared process, can share ArrayBuffers

Cluster vs worker_threads

A useful rule of thumb: if you want more request-handling capacity for an I/O-bound server, use cluster (or a process manager that does the same thing). If a single request triggers heavy computation, use worker_threads.

Hands-on: CPU-bound work with worker_threads

Hashing a 100 MB blob with scryptSync blocks the loop for seconds. Push it to a worker.

// hash-worker.js
const { parentPort, workerData } = require('node:worker_threads');
const crypto = require('node:crypto');

const hash = crypto.scryptSync(workerData.password, workerData.salt, 64);
parentPort.postMessage(hash);

// main.js
const { Worker } = require('node:worker_threads');

function hashPassword(password) {
  return new Promise((resolve, reject) => {
    const w = new Worker('./hash-worker.js', {
      workerData: { password, salt: 'fixed-salt' },
    });
    w.on('message', resolve);
    w.on('error', reject);
    w.on('exit', (code) => {
      if (code !== 0) reject(new Error(`worker exited ${code}`));
    });
  });
}

The main thread stays free to serve other requests while the worker churns. For repeated work, use a pool (such as piscina) so you do not pay worker startup cost each call.

Hands-on: scaling a server with cluster

const cluster = require('node:cluster');
const os = require('node:os');
const http = require('node:http');

if (cluster.isPrimary) {
  for (let i = 0; i < os.availableParallelism(); i++) cluster.fork();
  cluster.on('exit', (worker) => {
    console.error(`worker ${worker.process.pid} died, respawning`);
    cluster.fork();
  });
} else {
  http.createServer((_, res) => res.end('ok')).listen(3000);
}

Each worker binds the same port; the kernel load-balances new connections across them. In practice, most teams use PM2, systemd, or Kubernetes to do this externally, which gives you better observability and zero-downtime restarts.

Workers can share memory with SharedArrayBuffer. Cluster cannot, because each process has its own address space; you would have to use Redis, a file, or another IPC.

const shared = new SharedArrayBuffer(1024);
const view = new Int32Array(shared);
worker.postMessage(shared); // worker writes to the same bytes

Use Atomics for any concurrent mutation. It is easy to write a data race here that only shows up under load.

Common pitfalls

Spawning a worker per request. Worker startup costs tens of milliseconds and tens of MB. Use a pool.
Using cluster for CPU-bound jobs inside a single request. The request still blocks one worker; you have only spread the misery.
Treating postMessage as cheap. Structured cloning copies. Big payloads should use Transferable (ArrayBuffer.transfer) or SharedArrayBuffer.
Forgetting to handle worker error and exit. A silent crash kills throughput and you will not know why.
Logging from many workers to one file. Race conditions in writes. Send logs to stdout and let the platform collect them.
Using sticky sessions naively with cluster. WebSocket connections need session affinity that round-robin does not provide.

Practical tips

Profile first. If your event loop lag (perf_hooks monitorEventLoopDelay) is high, you have CPU-bound work that wants threads. If CPU is idle but throughput is low, you want more workers.
Use piscina for a robust worker pool. It handles queueing, abort signals, and backpressure.
For servers, prefer running multiple containers over cluster. The orchestrator already does process management; do not reinvent it.
Keep worker scripts small. Smaller code, faster startup, faster recovery from crashes.
Measure end-to-end. autocannon for throughput, clinic.js for diagnosing event-loop blocks.

Wrap-up

cluster scales request handling, worker_threads unblocks the event loop, and they are not interchangeable. Most production setups use external process management plus a worker pool for the occasional CPU-heavy task. Choose based on whether your bottleneck is connections or computation, and let the runtime do what it is good at: keeping one event loop responsive.