Skip to content
Node.js nd performance 5 min read

Avoiding Event Loop Blocking

Node.js runs your JavaScript on a single thread, so every callback, promise continuation, and timer takes its turn on one shared event loop. When a single piece of synchronous code runs for too long — parsing a giant JSON payload, hashing a password with too many rounds, or looping over a million records — nothing else can run. Incoming requests queue up, timers fire late, and health checks time out. A function that takes 200 ms doesn’t slow down one request; it adds 200 ms of latency to every request in flight. Keeping the loop free is the single most important performance skill in Node.

Why one slow function hurts everything

The event loop processes work in phases (timers, I/O callbacks, setImmediate, etc.) and only moves to the next item when the current callback returns. Asynchronous I/O — disk reads, network calls, database queries — is offloaded to the libuv thread pool or the OS and does not occupy the loop while it waits. Pure CPU work has nowhere to go: it runs inline and holds the thread hostage.

import { createServer } from "node:http";

createServer((req, res) => {
  if (req.url === "/block") {
    // Synchronous busy loop — blocks the loop for ~2 seconds
    const end = Date.now() + 2000;
    while (Date.now() < end) {}
  }
  res.end("ok\n");
}).listen(3000);

Output:

$ curl localhost:3000/fast &   # returns instantly... normally
$ curl localhost:3000/block    # but this stalls the whole process
# /fast now also waits ~2s because the loop is busy

The /fast request has no CPU work of its own, yet it is delayed because the loop is stuck inside /block.

Detecting blocking

You cannot fix what you cannot see. The clearest signal is event loop delay — how late timers fire relative to when they were scheduled. Node exposes this directly via perf_hooks.

import { monitorEventLoopDelay } from "node:perf_hooks";

const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();

setInterval(() => {
  // Values are in nanoseconds
  console.log(`p99 loop delay: ${(h.percentile(99) / 1e6).toFixed(1)} ms`);
  h.reset();
}, 1000);

Output:

p99 loop delay: 1.4 ms
p99 loop delay: 2.1 ms
p99 loop delay: 2014.7 ms   <-- a blocking task ran here

Other useful tools:

ToolWhat it showsWhen to use
monitorEventLoopDelayLive histogram of loop lagProduction metrics, alerting
--prof + node --prof-processV8 sampling profiler outputFinding the hot function
node --cpu-prof.cpuprofile for Chrome DevToolsVisual flame graphs
clinic flame / 0xFlame graphs from real loadPinpointing CPU hotspots

A sustained event loop delay above ~50-100 ms at p99 almost always means CPU-bound code on the main thread. Alert on it; do not wait for users to report slowness.

Offloading to worker threads

When a task is genuinely CPU-heavy and unavoidable — image resizing, cryptography, compression, large-scale parsing — move it off the main thread with the worker_threads module. Each worker has its own V8 instance and event loop, so the work runs truly in parallel without touching your request-handling loop.

// pool.js — run CPU work in a worker, await the result
import { Worker } from "node:worker_threads";

export function runTask(workerData) {
  return new Promise((resolve, reject) => {
    const worker = new Worker(new URL("./worker.js", import.meta.url), {
      workerData,
    });
    worker.once("message", resolve);
    worker.once("error", reject);
    worker.once("exit", (code) => {
      if (code !== 0) reject(new Error(`Worker exited with ${code}`));
    });
  });
}
// worker.js — the heavy computation lives here
import { workerData, parentPort } from "node:worker_threads";
import { createHash } from "node:crypto";

let hash = workerData;
for (let i = 0; i < 1_000_000; i++) {
  hash = createHash("sha256").update(hash).digest("hex");
}
parentPort.postMessage(hash);

The main thread stays responsive while worker.js grinds through a million hash rounds. In a real service, keep a fixed pool of workers (one per CPU core) rather than spawning a new one per request — worker startup costs tens of milliseconds. Libraries like piscina provide a battle-tested pool, or you can build one around the snippet above.

Breaking up long tasks

Not every long task deserves a worker. If the work is mostly synchronous JavaScript over a large collection, you can yield the loop periodically so other callbacks get a turn. Slice the work into batches and hand control back with setImmediate between batches.

// Process a huge array without starving the loop
async function processInBatches(items, batchSize, handle) {
  for (let i = 0; i < items.length; i += batchSize) {
    const batch = items.slice(i, i + batchSize);
    for (const item of batch) handle(item);
    // Yield: let queued I/O and timers run before the next batch
    await new Promise((resolve) => setImmediate(resolve));
  }
}

await processInBatches(records, 1000, (r) => transform(r));

setImmediate schedules the continuation after the current I/O phase, so pending requests are serviced between batches. Prefer it over setTimeout(fn, 0), which is clamped to a minimum delay and adds avoidable latency.

A few common offenders and their fixes:

Blocking patternFix
fs.readFileSync in a request pathUse await fs.readFile (async)
JSON.parse on multi-MB payloadsStream-parse, or do it in a worker
crypto.pbkdf2Sync / bcrypt syncUse the async variants
Tight loops over big arraysBatch with setImmediate, or use a worker
Synchronous template/regex on huge inputBound input size; offload heavy cases

Best Practices

  • Treat the event loop as a shared resource: any synchronous function over ~10 ms is a latency tax on every concurrent request.
  • Monitor monitorEventLoopDelay p99 in production and alert when it crosses your latency budget.
  • Always prefer the asynchronous form of core APIs (fs.readFile, crypto.pbkdf2, zlib.gzip) over their *Sync counterparts in hot paths.
  • Offload genuinely CPU-bound work to a pooled set of worker_threads, sized to your core count — never spawn a worker per request.
  • Break large synchronous loops into batches and yield with setImmediate so I/O and timers stay responsive.
  • Profile before optimizing: use --cpu-prof or clinic flame to find the actual hot function instead of guessing.
  • Cap the size of untrusted input (request bodies, uploads) so a single payload can’t monopolize the loop.
Last updated June 14, 2026
Was this helpful?