Scaling with the cluster Module

A single Node.js process runs your JavaScript on one thread, which means it can only saturate one CPU core no matter how many your server has. The built-in node:cluster module solves this by forking the same program into multiple processes that share a single listening port, letting the operating system spread incoming connections across every core. It is the classic, battle-tested way to scale a network server vertically — and because each worker is a full, isolated process, a crash in one never takes down the others.

The master/worker model

Clustering follows a parent/child topology. One primary process (historically called the master) starts first, forks a set of worker processes, and supervises them. The primary does not handle requests itself; its job is to manage the pool. Each worker is an independent Node.js instance with its own event loop, memory, and V8 heap, running the very same entry file.

The trick that makes this useful for servers is shared sockets. When a worker calls server.listen(), the cluster module routes the bind through the primary so that all workers appear to listen on the same port. The primary owns the actual file descriptor and hands off accepted connections to workers.

Forking one worker per CPU core

The common pattern is to fork as many workers as there are logical CPUs, exposed via os.availableParallelism() (preferred since Node 18.14) rather than the older os.cpus().length.

import cluster from 'node:cluster';
import { availableParallelism } from 'node:os';
import { createServer } from 'node:http';
import process from 'node:process';

const numCPUs = availableParallelism();

if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} starting ${numCPUs} workers`);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    console.log(`Worker ${worker.process.pid} died (${signal || code})`);
  });
} else {
  createServer((req, res) => {
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end(`Handled by worker ${process.pid}\n`);
  }).listen(3000);

  console.log(`Worker ${process.pid} listening on 3000`);
}

Output:

Primary 41280 starting 8 workers
Worker 41283 listening on 3000
Worker 41284 listening on 3000
Worker 41285 listening on 3000
...

The cluster.isPrimary flag (with cluster.isWorker as its inverse) tells each process which branch to run. Because the file is re-executed top to bottom for every fork, keep the primary-only setup inside the isPrimary block.

In CommonJS the same code works with const cluster = require('node:cluster'). Note that cluster.isMaster is the deprecated alias for cluster.isPrimary — prefer the newer name in modern code.

Automatic load balancing

You do not write any dispatch logic yourself — the cluster module distributes incoming connections for you. There are two scheduling policies, controlled by cluster.schedulingPolicy:

Policy	Constant	Behavior	Default
Round-robin	`cluster.SCHED_RR`	Primary accepts connections and hands them to workers in turn	All platforms except Windows
OS-driven	`cluster.SCHED_NONE`	Workers accept directly; the kernel decides	Default on Windows

Round-robin generally spreads load far more evenly because it avoids the “thundering herd” imbalance that SCHED_NONE can produce. Set the policy before any worker is forked:

import cluster from 'node:cluster';

// Force round-robin everywhere (also settable via NODE_CLUSTER_SCHED_POLICY=rr)
cluster.schedulingPolicy = cluster.SCHED_RR;

Because each request may land on a different worker, never store session state in a worker’s memory. Use a shared store such as Redis or a database so any worker can serve any client.

Restarting dead workers

A worker can crash from an unhandled exception, an out-of-memory condition, or an explicit process.exit(). The primary stays alive, so it can detect the loss via the exit event and immediately fork a replacement to keep the pool at full strength.

import cluster from 'node:cluster';
import { availableParallelism } from 'node:os';

if (cluster.isPrimary) {
  for (let i = 0; i < availableParallelism(); i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    if (!worker.exitedAfterDisconnect) {
      console.error(`Worker ${worker.process.pid} crashed — respawning`);
      cluster.fork();
    }
  });
}

The worker.exitedAfterDisconnect flag distinguishes an intentional shutdown (via worker.kill() or worker.disconnect()) from an unexpected crash, so you only respawn when something actually went wrong.

Guard against crash loops: if a worker dies instantly on startup, blindly re-forking will spin your CPU at 100%. Track restart timestamps and back off (or give up) if a worker keeps dying within a few seconds of launch.

Zero-downtime reloads

You can also use this control to roll out new code without dropping connections. Fork a fresh worker, wait for its listening event, then disconnect() an old one so it finishes in-flight requests before exiting — repeat across the pool one at a time.

Best practices

Size the pool to availableParallelism(); more workers than cores just adds context-switching overhead.
Always listen for the exit event and respawn crashed workers, with back-off to avoid tight restart loops.
Keep workers stateless — push sessions, caches, and counters into Redis or a database since requests are spread across processes.
Prefer the round-robin scheduler (the default off Windows) for the most even load distribution.
Do CPU-bound work in worker_threads instead of clustering — cluster scales I/O-bound servers, not heavy computation in a single request.
In production, let a process manager (PM2, systemd, or a container orchestrator) supervise the primary itself so the whole app restarts if the primary ever dies.