Latency Tuning

Latency in Kafka is the time it takes for a record to travel from producer.send() to the moment a consumer’s processing code sees it. For order books, fraud scoring, and real-time alerting, every millisecond of that journey is a competitive feature. The defaults in Kafka favour throughput, so a latency-sensitive workload almost always needs deliberate tuning on both the producer and consumer side, plus a clear-eyed acceptance of the trade-offs you make to shave milliseconds off the path.

Where end-to-end latency comes from

End-to-end latency is the sum of several waits, and the biggest ones are usually self-inflicted batching delays rather than network or disk time. Knowing the breakdown tells you which knob actually matters.

send() -> [producer batch wait] -> network -> [broker append + replication] -> [consumer fetch wait] -> deserialize -> process

The two waits you control most directly are the producer batch wait (linger.ms) and the consumer fetch wait (fetch.max.wait.ms). The replication wait is governed by acks. Almost all practical latency tuning is rebalancing these three.

Producer: send immediately

By default the producer waits up to linger.ms to fill a batch. For low latency you want it to send as soon as a record is ready, so set linger.ms=0 (the default is already 0 in modern clients, but verify it — many teams raise it for throughput). Keep batch.size modest so a record never sits waiting for a large batch to fill, and avoid compression that adds CPU time on the hot path unless your payloads are large and network-bound.

Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

// Latency-first producer settings
props.put(ProducerConfig.LINGER_MS_CONFIG, 0);          // do not wait to batch
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);     // small batches, fast flush
props.put(ProducerConfig.ACKS_CONFIG, "1");             // leader-only ack
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);

try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
    producer.send(new ProducerRecord<>("orders", key, value));
}

Tip: A tiny linger.ms like 1-2 ms can paradoxically lower tail latency under bursty load by amortising request overhead, while still feeling instantaneous. Measure p99, not just the average, before settling on 0.

The acks trade-off

acks controls how many replicas must confirm a write before send() completes. It is the sharpest latency-versus-durability lever you have.

`acks`	Latency	Durability	Use when
`0`	Lowest (fire-and-forget)	None — silent data loss on failure	Metrics, logs you can drop
`1`	Low — waits for leader only	Loses data if leader dies before replication	Most latency-sensitive apps
`all`	Highest — waits for in-sync replicas	Strongest; no loss with `min.insync.replicas=2`	Payments, ledgers

For genuine low latency without abandoning durability, acks=1 is the common sweet spot. Only drop to acks=0 for truly disposable data. If you need acks=all, you can claw back some latency by keeping the in-sync replica set small (e.g. min.insync.replicas=2) and co-locating brokers to cut replication round-trip time.

Consumer: do not wait for data

The consumer batches too. It will block up to fetch.max.wait.ms (default 500 ms) waiting to accumulate fetch.min.bytes (default 1) before returning. For latency you want the consumer to return the instant a single record is available, so keep fetch.min.bytes at 1 and lower fetch.max.wait.ms.

Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "latency-group");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());

props.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 1);        // return as soon as data exists
props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 10);     // cap the wait at 10 ms
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 100);     // keep poll batches small

In Spring Boot, the same settings go in application.yml:

spring:
  kafka:
    consumer:
      fetch-min-size: 1
      fetch-max-wait: 10ms
      max-poll-records: 100
    producer:
      acks: "1"
      properties:
        linger.ms: 0

A tight poll() loop matters as much as the fetch settings: if your handler does slow work between polls, that processing time is added to perceived latency for every later record in the batch. Keep handlers fast or offload heavy work.

The latency vs throughput tension

Every latency optimisation pushes against throughput. Sending one record per request (linger.ms=0, tiny batches) maximises requests-per-second and per-record overhead; large batches maximise bytes-per-second and amortise that overhead. You cannot have both maxima at once.

linger.ms=0, batch.size small   ->  low latency,  lower max throughput
linger.ms=20, batch.size large  ->  higher latency, higher max throughput

Pick the regime your SLA demands, then tune within it. If you need both low latency and high throughput, scale horizontally with more partitions and consumer instances rather than fattening batches. Also minimise hops: avoid unnecessary intermediate topics, and process in place rather than republishing through extra stages.

Best Practices

Set linger.ms=0 (or 1-2 ms to protect the tail) and keep batch.size small on latency-critical producers.
Use acks=1 as the default low-latency setting; reserve acks=all for data you cannot lose, and acks=0 only for disposable streams.
Keep fetch.min.bytes=1 and lower fetch.max.wait.ms (e.g. 10 ms) so consumers never wait for a batch to fill.
Keep max.poll.records modest and processing fast so per-record latency stays predictable.
Avoid compression on small payloads — the CPU cost can outweigh any network gain on the hot path.
Always measure p99/p999 latency, not the mean; tail behaviour is where latency SLAs are won or lost.
Scale out with partitions instead of larger batches when you need both low latency and high volume.