Introduction to Node.js Streams
A stream is an abstraction for reading or writing data piece by piece instead of all at once. Rather than loading an entire file, HTTP body, or database result into memory and then acting on it, streams let you process data in small chunks as it arrives. This keeps memory usage flat and constant no matter how large the source is, which is why nearly every I/O-heavy API in Node.js — fs, http, zlib, crypto, net — is built on top of the stream interface.
Why streams matter
Imagine copying a 4 GB video file. The naive approach reads the whole file into a Buffer, then writes that buffer out — peaking at 4 GB of resident memory and forcing the user to wait for the entire read before a single byte is written. A stream reads a ~64 KB chunk, writes it, then reads the next, holding only one chunk in memory at a time. Throughput stays high, latency to first byte drops to milliseconds, and memory stays bounded.
The payoff shows up most clearly with data whose size you do not control: uploads, log files, network sockets, and query results. Streams turn “how much RAM do I have?” into a non-question.
import { stat } from 'node:fs/promises';
import { createReadStream } from 'node:fs';
const { size } = await stat('big.log');
let seen = 0;
for await (const chunk of createReadStream('big.log')) {
seen += chunk.length;
}
console.log(`processed ${size} bytes, peak chunk held in memory: ~64 KB`);
Output:
processed 1073741824 bytes, peak chunk held in memory: ~64 KB
A for await...of loop over a readable stream is the simplest way to consume one in modern Node.js. The stream yields Buffer (or string) chunks until the source is exhausted.
The four stream types
Every stream in Node.js is an instance of one of four base classes, all exported from node:stream. Understanding which type you are dealing with tells you exactly which methods and events are available.
| Type | Direction | You call | Examples |
|---|---|---|---|
Readable | source → you | .read(), for await | fs.createReadStream, HTTP request body, process.stdin |
Writable | you → sink | .write(), .end() | fs.createWriteStream, HTTP response, process.stdout |
Duplex | both, independent | read and write | TCP sockets (net.Socket) |
Transform | both, write transforms read | pipe through | zlib.createGzip, crypto.createCipheriv |
A Readable stream produces data — you pull chunks out of it. A Writable stream consumes data — you push chunks into it. A Duplex stream is both at once, with its read and write sides operating independently (a socket can receive and send simultaneously). A Transform stream is a special Duplex where what you read out is a function of what you wrote in; gzip compression is the classic case — bytes go in one side, smaller compressed bytes come out the other.
import { createReadStream, createWriteStream } from 'node:fs';
import { createGzip } from 'node:zlib';
const source = createReadStream('app.log'); // Readable
const gzip = createGzip(); // Transform
const dest = createWriteStream('app.log.gz'); // Writable
source.pipe(gzip).pipe(dest);
dest.on('finish', () => console.log('done'));
Output:
done
Chunks and object mode
By default streams carry binary data as Buffer objects, and the chunk size is governed by highWaterMark (64 KB for files, 16 KB for sockets). The “chunk” is simply whatever slice of data the stream hands you on a given 'data' event or loop iteration — its size is an implementation detail you should not depend on.
Streams can also run in object mode, where each chunk is an arbitrary JavaScript value instead of bytes. This is what lets you build pipelines over records — parsed CSV rows, database documents, log entries — using the same primitives.
import { Readable } from 'node:stream';
const records = Readable.from([
{ id: 1, name: 'ada' },
{ id: 2, name: 'lin' },
]);
for await (const row of records) {
console.log(row.name);
}
Output:
ada
lin
Readable.from() builds an object-mode stream from any iterable or async iterable — a quick way to feed array or generator data into a stream pipeline.
Backpressure in one paragraph
The most important idea in streaming is backpressure: the mechanism that keeps a fast producer from overwhelming a slow consumer. When you write to a Writable and its internal buffer fills past highWaterMark, .write() returns false, signalling “pause — I’m full.” A well-behaved producer stops until the 'drain' event fires. You rarely manage this by hand: pipe() and pipeline() wire backpressure up automatically, pausing the source whenever the destination falls behind. Ignoring backpressure (for example, looping write() while always discarding the return value) is the single most common cause of runaway memory in Node.js programs.
Always connect streams with
stream.pipeline(fromnode:stream/promises) rather than chaining.pipe()manually.pipelinepropagates errors, honours backpressure, and destroys every stream on failure —.pipe()leaks file descriptors when something throws midway.
Best Practices
- Reach for streams whenever data size is large, unbounded, or unknown — files, uploads, sockets, and query results.
- Consume readables with
for await...oforpipeline; avoid accumulating every chunk into one giant buffer. - Use
stream.pipeline(ornode:stream/promises) to connect streams so errors and cleanup are handled for you. - Never ignore the return value of
.write()in hand-rolled loops — respect backpressure and wait for'drain'. - Use object mode (
Readable.from,{ objectMode: true }) to stream records rather than raw bytes when appropriate. - Treat chunk boundaries as arbitrary; never assume a chunk equals a line, message, or record without a parser.
- Prefer the built-in
node:streamprimitives andTransformstreams over manual buffering for data-processing pipelines.