Performance Overview

Kafka is built for high throughput, but the defaults are deliberately conservative — they favor safety and broad compatibility over raw speed. Getting the most out of a cluster means understanding the handful of levers that actually move the needle and how each one trades against the others. Almost every tuning decision in Kafka is a point on a triangle: throughput, latency, and durability. You can push hard on any one corner, but rarely on all three at once. This page maps the levers to their effects so the detailed tuning pages make sense in context.

The throughput–latency–durability triangle

Before touching a single config, it helps to name what you are optimizing for. These three goals pull in different directions:

Throughput — total records (or bytes) per second the system can sustain. Driven by batching, compression, partition count, and parallelism.
Latency — how long an individual record takes from send() to being readable by a consumer. Hurt by waiting to batch and by stronger durability guarantees.
Durability — the guarantee that an acknowledged record survives broker failures. Driven by acks, replication, and flush behavior.

                 Durability
                    /\
                   /  \
                  /    \
                 /      \
        Latency /________\ Throughput

A throughput-first pipeline lingers to build big compressed batches and accepts higher per-record latency. A latency-first request/response flow sends tiny, immediate, uncompressed batches. A durability-first ledger waits for acks=all and in-sync replicas. Knowing which corner you are aiming at turns tuning from guesswork into a deliberate set of choices.

The levers and what they do

Each major lever affects the three goals differently. The table summarizes the direction of impact; the rest of this section explains the mechanics.

Lever	Throughput	Latency	Durability	Where it lives
Batching (`batch.size`, `linger.ms`)	⬆⬆	⬇ (waits)	—	Producer
Compression (`compression.type`)	⬆	slight ⬇	—	Producer
Partition count	⬆⬆	—	—	Topic
`acks` (0 / 1 / all)	⬆ at lower acks	⬆ at lower acks	⬇ at lower acks	Producer
Replication factor + min ISR	⬇ slightly	slight ⬇	⬆⬆	Topic / broker
Page cache (OS)	⬆⬆	⬆	—	Broker / OS

Batching

Batching is the single biggest throughput multiplier. The producer accumulates records destined for the same partition and ships them in one request, amortizing network and broker overhead across many records. linger.ms lets the producer wait briefly to fill batches; under sustained load this is nearly free, because batches fill on size before the timer fires. The cost is latency only when traffic is too sparse to fill a batch.

Compression

Compression runs per batch, so it pairs naturally with batching — bigger batches compress better. It cuts network bytes and broker disk usage, and the compressed batch is stored and replicated as-is, so consumers also read fewer bytes. The trade is CPU on the producer. lz4 and zstd give the best ratio-to-CPU balance for most workloads; zstd compresses hardest, lz4 is fastest.

Partition count

Partitions are the unit of parallelism. A topic with one partition is processed by at most one consumer in a group, no matter how many you start. More partitions allow more producers, more consumers, and more broker disks to work in parallel, raising aggregate throughput. But partitions are not free: each adds open file handles, replication overhead, and rebalance cost, and too many can hurt end-to-end latency and controller load.

acks and replication

acks controls when a producer considers a write successful and is the clearest durability/throughput trade in Kafka:

acks=0 — fire and forget; highest throughput, no delivery guarantee.
acks=1 — leader-only acknowledgment; fast, but a leader crash before replication loses data.
acks=all — leader plus all in-sync replicas; safest, with extra latency for the replication round trip.

Replication factor and min.insync.replicas decide how many copies must exist and how many must confirm. Together with acks=all, they define your durability floor at the cost of some throughput and latency.

spring:
  kafka:
    producer:
      acks: all              # wait for all in-sync replicas
      batch-size: 131072     # 128 KB batches
      compression-type: zstd
      properties:
        linger.ms: 10        # build fuller batches under light load
        enable.idempotence: true   # safe retries; no duplicates

Page cache

Kafka does not maintain its own in-memory cache of messages — it relies on the OS page cache. Recently written records are still in cache when consumers read them, so reads are served from RAM without touching disk (the “zero-copy” hot path). This is why Kafka brokers want lots of free RAM rather than a huge JVM heap, and why consumer lag matters: a lagging consumer reads cold data from disk instead of warm data from cache.

Give the broker JVM a modest heap (often 6–8 GB is plenty) and leave the rest of system RAM to the OS page cache. A large heap steals memory from the cache and only adds GC pressure — Kafka’s performance comes from the page cache, not the heap.

A simple throughput check

The bundled kafka-producer-perf-test.sh script is the fastest way to establish a baseline before tuning anything.

kafka-producer-perf-test.sh \
  --topic perf-test \
  --num-records 5000000 \
  --record-size 1024 \
  --throughput -1 \
  --producer-props bootstrap.servers=localhost:9092 \
      batch.size=131072 linger.ms=10 compression.type=lz4 acks=1

Output:

5000000 records sent, 412_337 records/sec (403.06 MB/sec), 8.41 ms avg latency, 142.00 ms max latency.

Re-run with one variable changed at a time — acks=all, a different compression.type, a larger batch.size — and watch how records/sec and average latency move. Measuring one lever at a time is the core discipline of Kafka tuning.

Best Practices

Decide which corner of the triangle you are optimizing for before changing configs; let that goal drive every choice.
Treat batching and compression as a pair — large batches with lz4/zstd give the best throughput per CPU.
Size partitions for your target parallelism, but resist over-partitioning; it adds replication and rebalance overhead.
Match acks and min.insync.replicas to your real durability requirement, not a reflexive acks=all everywhere.
Leave most system RAM to the OS page cache and keep the broker heap modest.
Benchmark with kafka-producer-perf-test.sh / kafka-consumer-perf-test.sh, changing one variable at a time so you can attribute each result.