Latency Tuning
Latency in Kafka is the time it takes for a record to travel from producer.send() to the moment a consumer’s processing code sees it. For order books, fraud scoring, and real-time alerting, every millisecond of that journey is a competitive feature. The defaults in Kafka favour throughput, so a latency-sensitive workload almost always needs deliberate tuning on both the producer and consumer side, plus a clear-eyed acceptance of the trade-offs you make to shave milliseconds off the path.
Where end-to-end latency comes from
End-to-end latency is the sum of several waits, and the biggest ones are usually self-inflicted batching delays rather than network or disk time. Knowing the breakdown tells you which knob actually matters.
send() -> [producer batch wait] -> network -> [broker append + replication] -> [consumer fetch wait] -> deserialize -> process
The two waits you control most directly are the producer batch wait (linger.ms) and the consumer fetch wait (fetch.max.wait.ms). The replication wait is governed by acks. Almost all practical latency tuning is rebalancing these three.
Producer: send immediately
By default the producer waits up to linger.ms to fill a batch. For low latency you want it to send as soon as a record is ready, so set linger.ms=0 (the default is already 0 in modern clients, but verify it — many teams raise it for throughput). Keep batch.size modest so a record never sits waiting for a large batch to fill, and avoid compression that adds CPU time on the hot path unless your payloads are large and network-bound.
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// Latency-first producer settings
props.put(ProducerConfig.LINGER_MS_CONFIG, 0); // do not wait to batch
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384); // small batches, fast flush
props.put(ProducerConfig.ACKS_CONFIG, "1"); // leader-only ack
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);
try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
producer.send(new ProducerRecord<>("orders", key, value));
}
Tip: A tiny
linger.mslike 1-2 ms can paradoxically lower tail latency under bursty load by amortising request overhead, while still feeling instantaneous. Measure p99, not just the average, before settling on 0.
The acks trade-off
acks controls how many replicas must confirm a write before send() completes. It is the sharpest latency-versus-durability lever you have.
acks | Latency | Durability | Use when |
|---|---|---|---|
0 | Lowest (fire-and-forget) | None — silent data loss on failure | Metrics, logs you can drop |
1 | Low — waits for leader only | Loses data if leader dies before replication | Most latency-sensitive apps |
all | Highest — waits for in-sync replicas | Strongest; no loss with min.insync.replicas=2 | Payments, ledgers |
For genuine low latency without abandoning durability, acks=1 is the common sweet spot. Only drop to acks=0 for truly disposable data. If you need acks=all, you can claw back some latency by keeping the in-sync replica set small (e.g. min.insync.replicas=2) and co-locating brokers to cut replication round-trip time.
Consumer: do not wait for data
The consumer batches too. It will block up to fetch.max.wait.ms (default 500 ms) waiting to accumulate fetch.min.bytes (default 1) before returning. For latency you want the consumer to return the instant a single record is available, so keep fetch.min.bytes at 1 and lower fetch.max.wait.ms.
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "latency-group");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 1); // return as soon as data exists
props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 10); // cap the wait at 10 ms
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 100); // keep poll batches small
In Spring Boot, the same settings go in application.yml:
spring:
kafka:
consumer:
fetch-min-size: 1
fetch-max-wait: 10ms
max-poll-records: 100
producer:
acks: "1"
properties:
linger.ms: 0
A tight poll() loop matters as much as the fetch settings: if your handler does slow work between polls, that processing time is added to perceived latency for every later record in the batch. Keep handlers fast or offload heavy work.
The latency vs throughput tension
Every latency optimisation pushes against throughput. Sending one record per request (linger.ms=0, tiny batches) maximises requests-per-second and per-record overhead; large batches maximise bytes-per-second and amortise that overhead. You cannot have both maxima at once.
linger.ms=0, batch.size small -> low latency, lower max throughput
linger.ms=20, batch.size large -> higher latency, higher max throughput
Pick the regime your SLA demands, then tune within it. If you need both low latency and high throughput, scale horizontally with more partitions and consumer instances rather than fattening batches. Also minimise hops: avoid unnecessary intermediate topics, and process in place rather than republishing through extra stages.
Best Practices
- Set
linger.ms=0(or 1-2 ms to protect the tail) and keepbatch.sizesmall on latency-critical producers. - Use
acks=1as the default low-latency setting; reserveacks=allfor data you cannot lose, andacks=0only for disposable streams. - Keep
fetch.min.bytes=1and lowerfetch.max.wait.ms(e.g. 10 ms) so consumers never wait for a batch to fill. - Keep
max.poll.recordsmodest and processing fast so per-record latency stays predictable. - Avoid compression on small payloads — the CPU cost can outweigh any network gain on the hot path.
- Always measure p99/p999 latency, not the mean; tail behaviour is where latency SLAs are won or lost.
- Scale out with partitions instead of larger batches when you need both low latency and high volume.