Interview Questions: Performance & Scaling

Performance and scaling questions separate engineers who can ship a prototype from those who can run Node.js in production. Interviewers want to know whether you understand why a single-threaded runtime can stall, when to reach for clustering versus worker threads, and how to grow a service from one box to many. The answers below target modern Node.js 20/22 LTS and favor built-in tooling over external dependencies.

Blocking the event loop

Q: What does “blocking the event loop” mean, and how do you detect it?

Your JavaScript runs on one thread, so any synchronous work that takes a long time—a tight loop, a giant JSON.parse, synchronous crypto—stops the event loop from processing other callbacks. While the loop is blocked, every other in-flight request waits, and throughput collapses even though CPU sits mostly idle. The rule of thumb is that no single tick should hold the thread for more than a few milliseconds.

import { createServer } from "node:http";

createServer((req, res) => {
  if (req.url === "/blocking") {
    // Synchronous CPU work freezes ALL connections
    let sum = 0;
    for (let i = 0; i < 5e9; i++) sum += i;
    res.end(`sum=${sum}`);
  } else {
    res.end("fast path\n");
  }
}).listen(3000);

While /blocking runs, a request to the fast path also hangs. Detect this with the built-in perf_hooks event-loop delay monitor:

import { monitorEventLoopDelay } from "node:perf_hooks";

const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();

setInterval(() => {
  console.log(`loop lag p99: ${(h.percentile(99) / 1e6).toFixed(1)} ms`);
}, 1000);

Output:

loop lag p99: 1.2 ms
loop lag p99: 1380.5 ms

A healthy loop delay is sub-millisecond. Sustained tens of milliseconds means user-facing latency; hundreds of milliseconds means an outage in progress.

Cluster vs worker threads

Q: When do you use the cluster module versus worker threads?

Both add parallelism, but they solve different problems. The cluster module forks multiple processes that share a listening socket, so the OS load-balances incoming connections across cores—ideal for scaling an I/O-bound HTTP server. worker_threads runs JavaScript on additional threads inside one process sharing memory via SharedArrayBuffer, which is the right tool for offloading CPU-bound computation off the main thread.

Aspect	`cluster`	`worker_threads`
Unit	Separate OS processes	Threads in one process
Memory	Isolated per process	Can share via `SharedArrayBuffer`
Best for	Scaling an HTTP server across cores	Offloading CPU-heavy tasks
Communication	IPC messages	`postMessage` / shared memory
Crash blast radius	One worker process	Can take down the whole process

// cluster: one process per CPU core, all sharing port 3000
import cluster from "node:cluster";
import { availableParallelism } from "node:os";
import { createServer } from "node:http";

if (cluster.isPrimary) {
  for (let i = 0; i < availableParallelism(); i++) cluster.fork();
  cluster.on("exit", (worker) => cluster.fork()); // respawn on crash
} else {
  createServer((req, res) => res.end(`pid ${process.pid}\n`)).listen(3000);
}

For CPU work, push it to a thread so the request loop stays responsive:

// main.js
import { Worker } from "node:worker_threads";

function hash(input) {
  return new Promise((resolve, reject) => {
    const w = new Worker("./hash-worker.js", { workerData: input });
    w.once("message", resolve);
    w.once("error", reject);
  });
}

console.log(await hash("password123"));

// hash-worker.js
import { workerData, parentPort } from "node:worker_threads";
import { scryptSync } from "node:crypto";

const derived = scryptSync(workerData, "salt", 64).toString("hex");
parentPort.postMessage(derived);

Caching

Q: How do you use caching to improve Node.js performance?

Caching trades memory for fewer expensive operations—database reads, API calls, or recomputation. In-process caches (a Map or an LRU) are fastest but live and die with the process and don’t share across cluster workers. A distributed cache like Redis survives restarts and is shared by every instance, which is what you want once you scale horizontally.

const cache = new Map();
const TTL = 30_000;

async function getUser(id, fetchFromDb) {
  const hit = cache.get(id);
  if (hit && Date.now() - hit.at < TTL) return hit.value;

  const value = await fetchFromDb(id);
  cache.set(id, { value, at: Date.now() });
  return value;
}

For multi-instance deployments, move this to Redis so a cache write on one node is visible to all. Layer HTTP caching too—ETag and Cache-Control headers let clients and CDNs avoid hitting your origin at all.

Memory leaks

Q: What causes memory leaks in Node, and how do you find them?

A leak is memory that stays reachable forever, so the garbage collector can never reclaim it. The usual culprits are unbounded caches and Maps, listeners added but never removed, closures capturing large objects, and globals that only ever grow. The symptom is heap usage that trends upward across requests and never comes back down.

import { memoryUsage } from "node:process";

setInterval(() => {
  const { heapUsed, rss } = memoryUsage();
  console.log(`heap ${(heapUsed / 1e6).toFixed(1)} MB · rss ${(rss / 1e6).toFixed(1)} MB`);
}, 5000);

Output:

heap 42.1 MB · rss 88.3 MB
heap 71.6 MB · rss 120.9 MB
heap 103.4 MB · rss 158.2 MB

A steadily climbing heap is the signature of a leak. Capture a heap snapshot with node --inspect and the Chrome DevTools Memory panel, or call v8.writeHeapSnapshot(), then compare two snapshots to find which retained object set is growing.

Bound every cache. An unbounded Map keyed by user input is the single most common Node memory leak—use an LRU with a max size or per-entry TTL.

Horizontal scaling

Q: What are the strategies for scaling Node.js horizontally?

Vertical scaling (a bigger box) hits limits fast because one process uses one core. Horizontal scaling runs many stateless instances behind a load balancer. The key constraint is statelessness: keep no per-request state in process memory, so any instance can serve any request. Push shared state into Redis or a database, and store sessions there rather than in memory.

// Stateless: read identity from the request, never from local memory
import { createServer } from "node:http";

createServer(async (req, res) => {
  const token = req.headers.authorization?.split(" ")[1];
  const session = await redisGetSession(token); // shared store, not local
  res.end(JSON.stringify({ user: session?.userId ?? null }));
}).listen(process.env.PORT ?? 3000);

In practice you combine cluster (or a container orchestrator running N replicas) to fill every core on a host, then add hosts behind the balancer to scale out. Use sticky sessions only when you truly need them—stateless designs scale far more cleanly.

Best Practices

Keep every tick short: offload CPU-bound work to worker_threads and never call Sync APIs on the request path.
Use cluster or a container replica per core to use all CPUs; one Node process only fills one core.
Monitor event-loop delay and heap usage in production—rising loop lag and heap are your earliest warning signs.
Bound all caches with a max size or TTL, and move shared cache and session state into Redis once you scale out.
Design instances to be stateless so any node can handle any request behind the load balancer.
Set Cache-Control/ETag headers so CDNs and clients absorb load before it reaches Node.
Always respawn crashed cluster workers and run the process under a supervisor (PM2, systemd, or an orchestrator).