Node.js Graceful Shutdown Patterns

Intermediate 10 min read

What you'll learn

✓Why graceful shutdown matters
✓How SIGTERM and SIGINT differ
✓Draining HTTP connections
✓Closing databases and queues safely
✓Hard timeouts as a safety net

Prerequisites

•Node.js basics
•Familiarity with HTTP servers

What and Why

When an orchestrator like Kubernetes redeploys your service, it sends SIGTERM and waits a short grace period before sending SIGKILL. If your process ignores SIGTERM, in-flight requests get cut, database connections are dropped mid-transaction, and queue workers abandon jobs without acking. Users see 502s and you see flaky logs.

Graceful shutdown is the small bit of code that turns that mess into a calm sequence: stop accepting new work, finish what is in flight, close external connections, then exit. It is often fewer than fifty lines, and it is the difference between a quiet deploy and an outage.

Mental Model

Think of shutdown as a state machine with three states. Running: accept everything. Draining: refuse new work, finish current work. Stopped: process exits. The signal moves you from Running to Draining; the last in-flight task moves you to Stopped, or a timeout forces the move.

You almost always need a hard timeout. If a slow database client refuses to close, you cannot wait forever; the orchestrator will kill you anyway. Better to log the stuck resource and exit cleanly on your own terms.

Hands-on Example

An Express server that drains HTTP, closes the database, and falls back to a forced exit.

import express from 'express';
import http from 'node:http';

const app = express();
app.get('/', (_req, res) => res.send('ok'));

const server = http.createServer(app);
server.listen(3000);

let shuttingDown = false;
app.use((_req, res, next) => {
  if (shuttingDown) res.set('Connection', 'close');
  next();
});

async function shutdown(signal) {
  if (shuttingDown) return;
  shuttingDown = true;
  console.log(`received ${signal}, draining`);

  const forceExit = setTimeout(() => {
    console.error('forced exit after timeout');
    process.exit(1);
  }, 10_000).unref();

  server.close(async () => {
    try {
      await db.close();        // your database client
      await queue.close();     // your job queue
      clearTimeout(forceExit);
      console.log('clean exit');
      process.exit(0);
    } catch (err) {
      console.error('error during shutdown', err);
      process.exit(1);
    }
  });
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

The Connection: close header tells keep-alive clients to stop reusing the socket, which speeds draining noticeably.

  Running
   |  SIGTERM
   v
Draining ----- new requests refused
   |
   |  in-flight done    OR    timeout fires
   v                          v
Stopped (exit 0)         Stopped (exit 1)

Signal triggers drain; last request or timeout triggers exit.

Common Pitfalls

Calling process.exit(0) immediately on SIGTERM cuts open requests. Always close the server first and wait for its callback.

Forgetting to close worker queues. A BullMQ or Kafka consumer that does not call close() may leave jobs in a partial state and force a redelivery storm on the next pod.

Ignoring the orchestrator’s grace period. If Kubernetes gives you 30 seconds and your shutdown takes 60, you get SIGKILL. Set your hard timeout below the orchestrator’s value, not above it.

Handling uncaughtException by trying to keep running. The process is in an unknown state; log, run the same shutdown function, and exit.

Practical Tips

Wire the same shutdown function to SIGTERM, SIGINT, uncaughtException, and unhandledRejection. One path means fewer surprises.

Health probes should flip to unhealthy as soon as draining begins, so load balancers stop sending traffic. Liveness should stay healthy until the very end so the orchestrator does not restart you mid-drain.

Test shutdown locally with kill -TERM <pid> while sending traffic. If the deploy you fear is a Friday afternoon outage, this twenty-second test is worth it.

For multi-process clusters, the master should forward SIGTERM to workers and wait for each to drain before exiting itself.

Log every step of the shutdown with timestamps. When something hangs in production, those logs are the only thing that tells you which resource to fix.

Wrap-up

Graceful shutdown is small, mechanical, and worth doing right. Catch the signal, drain the server, close external resources, and arm a hard timeout. Wire health checks to reflect the draining state and test the whole thing locally. Do that and rolling deploys become a non-event instead of a recurring incident in your postmortems.