Node Streams and Backpressure Explained

Intermediate 10 min read

What you'll learn

✓What a stream really is in Node
✓How push and pull data flow differ
✓Why backpressure prevents memory blowups
✓How to use pipeline and async iteration
✓Common stream pitfalls and fixes

Prerequisites

•Comfortable with JavaScript and async I/O

If you have ever loaded a 5 GB file with fs.readFile and watched your process die, you already understand why streams exist. Streams let Node move data in chunks so memory stays flat regardless of input size. The catch is that streams only work well when producers and consumers stay in sync, and that synchronization is called backpressure.

What a stream actually is

A stream is an EventEmitter with a buffer and a contract. Readable streams produce data, writable streams consume it, duplex streams do both, and transform streams sit in the middle and mutate chunks as they pass through. The buffer has a highWaterMark, which is just a soft limit (16 KB by default for byte streams, 16 objects for object mode).

When the buffer fills, the stream tells you to slow down. When it drains, it tells you to resume. That signal is backpressure. Without it, a fast producer can flood a slow consumer and Node will happily queue gigabytes in memory until the process crashes.

Mental model

Producer --chunk--> [buffer | highWaterMark] --chunk--> Consumer
                     ^                            |
                     |                            |
                write() returns false      drain event
                     |                            |
                     +--- pause production -------+

Stream data flow with backpressure

The flow is: writable returns false when the buffer crosses the threshold, the producer pauses, and the writable emits drain when it is ready for more.

Hands-on: copy a file with proper backpressure

The naive version looks fine but ignores write()’s return value.

// BAD: no backpressure
const fs = require('node:fs');
const src = fs.createReadStream('big.log');
const dst = fs.createWriteStream('copy.log');

src.on('data', (chunk) => {
  dst.write(chunk); // ignores return value
});
src.on('end', () => dst.end());

If reads outpace writes (very common when writing across disks or to network), the writable buffer grows without bound. The fix is pipeline, which handles backpressure, errors, and cleanup for you.

const { pipeline } = require('node:stream/promises');
const fs = require('node:fs');

await pipeline(
  fs.createReadStream('big.log'),
  fs.createWriteStream('copy.log'),
);

For transforms, drop one in the middle. Here is a line-counting transform.

const { Transform } = require('node:stream');

const countLines = () => {
  let count = 0;
  return new Transform({
    transform(chunk, _enc, cb) {
      count += chunk.toString().split('\n').length - 1;
      cb(null, chunk);
    },
    flush(cb) {
      console.error(`lines: ${count}`);
      cb();
    },
  });
};

await pipeline(
  fs.createReadStream('big.log'),
  countLines(),
  fs.createWriteStream('copy.log'),
);

Async iteration: the modern way

Readable streams are async iterables. This often replaces hand-written event handlers and gets backpressure for free, because the loop body awaits each chunk.

import { createReadStream } from 'node:fs';

for await (const chunk of createReadStream('big.log', { encoding: 'utf8' })) {
  await process(chunk); // awaiting pauses the stream
}

Inside that for await, the readable pauses while your handler awaits. No manual pause/resume. Just keep in mind that throwing inside the loop must be caught or the stream is destroyed mid-flight.

Common pitfalls

Ignoring write()’s return value. If it returns false, stop writing until drain fires.
Using pipe without error handlers. pipe does not forward errors to the destination; use pipeline instead.
Mixing data listeners and pipe. Attaching data switches the stream into flowing mode and may steal chunks from the pipe.
Forgetting object mode. If you push objects, set objectMode: true on both ends, otherwise Node coerces to strings or buffers.
Not calling cb() in transforms. The stream will silently stall.
Setting highWaterMark to giant values to “fix” slowness. It only delays the symptom and hides real bottlenecks.

Practical tips

Default to pipeline (the promise version). It handles cleanup on error, including destroying earlier streams.
For HTTP, you already have streams. req is readable, res is writable. Pipe through gzip with zlib.createGzip().
Measure with process.memoryUsage(). If RSS climbs as input size grows, you are buffering somewhere you should not be.
Prefer async iteration for one-off scripts and pipeline for production wiring.
When wrapping a third-party producer, use Readable.from(asyncIterable) rather than rolling your own _read.

Wrap-up

Streams are not just an optimization. They are how Node stays small while moving large data. Once you internalize the backpressure contract, pipeline becomes the obvious default, transforms become Lego bricks, and memory profiles stay flat no matter how big the input grows. Reach for buffers only when the data legitimately fits and the simplicity is worth it.