Producer Compression

Compression is one of the highest-leverage settings on a Kafka producer: a single config line can cut your network egress and broker disk usage by 50-80% on text-heavy payloads like JSON or logs. The producer compresses each record batch before sending it over the wire, and the broker stores that compressed batch as-is — so the savings carry all the way through to disk and replication traffic. The cost is CPU, paid mostly by the producer at write time, which is why choosing the right codec matters in production.

How producer compression works

Compression in Kafka operates on a record batch, not on individual messages. When the producer accumulates records destined for a partition, it groups them into a batch (governed by batch.size and linger.ms), compresses the entire batch as a single unit, and ships it to the broker. Compressing many records together yields far better ratios than compressing each message in isolation, because the codec can exploit redundancy across records — repeated JSON keys, similar field values, shared headers, and so on.

Crucially, the broker does not decompress the data. It receives the compressed batch, validates the framing, and appends it to the log in compressed form. This means:

Disk usage on the broker drops in proportion to the compression ratio.
Replication traffic between brokers stays compressed.
Consumers receive the compressed batch and decompress it client-side.

Because the broker stores what the producer sends, end-to-end compression is essentially free for the cluster once the producer pays the CPU cost. The exception is when compression.type is set on the topic/broker to a different codec than the producer used — then the broker must recompress, adding CPU on the server. Leave the broker-side compression.type at its default of producer to avoid this.

Tip: Larger batches compress better. Pairing compression with a modest linger.ms (e.g. 5-20 ms) lets more records accumulate per batch, improving the ratio with little added latency.

Choosing a codec

The producer supports five values for compression.type. The trade-off is always ratio versus CPU versus throughput. The table below reflects typical behavior on structured text payloads (JSON, logs); your mileage varies with data shape.

`compression.type`	Compression ratio	CPU cost	Throughput	When to use
`none`	None	Zero	Highest	Already-compressed payloads (images, protobuf, parquet)
`gzip`	High	High	Low	Storage-bound, low message rate, max ratio matters
`snappy`	Medium	Low	High	Legacy default; fast, decent ratio
`lz4`	Medium	Very low	Very high	Latency-sensitive, high-throughput pipelines
`zstd`	High	Low-med	High	Best all-around: near-gzip ratio at snappy-like speed

For the vast majority of workloads, zstd is the right default. It delivers gzip-class compression ratios while staying close to snappy/lz4 in speed, and it is tunable. If you are extremely latency-sensitive and CPU-constrained, lz4 is the better pick — it has the lowest overhead of the real codecs. Reserve gzip for cases where shrinking storage is the dominant concern and you can spare the CPU. Use none only when your payloads are already compressed, since compressing compressed data wastes CPU and can even grow the batch.

Configuration

For the plain Java client, set compression.type (and optionally tune batching) when constructing the producer:

Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092,broker2:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

// Compression + batching that work well together
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "zstd");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 64 * 1024); // 64 KB batches
props.put(ProducerConfig.LINGER_MS_CONFIG, 10);         // wait up to 10 ms to fill a batch

try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
    producer.send(new ProducerRecord<>("orders", "order-42",
            "{\"id\":42,\"status\":\"PLACED\",\"amount\":19.99}"));
    producer.flush();
}

In Spring Boot, configure it declaratively in application.yaml:

spring:
  kafka:
    bootstrap-servers: broker1:9092,broker2:9092
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
      compression-type: zstd
      batch-size: 65536
      properties:
        linger.ms: 10

You can verify the effect from the command line by producing data and inspecting the topic’s on-disk size, or by checking the configured codec on a topic:

kafka-configs.sh --bootstrap-server broker1:9092 \
  --entity-type topics --entity-name orders --describe

Output:

Dynamic configs for topic orders are:
  compression.type=producer sensitive=false synonyms={...}

The value producer confirms the broker is honoring whatever codec the producer chose — no server-side recompression.

Warning: zstd requires Kafka 2.1+ on both producers and brokers, and consumers must use a client that supports it. In a mixed-version fleet, confirm every consumer can decompress before switching to zstd, or older consumers will fail to read the topic.

Measuring the trade-off

Don’t guess — measure. Send a representative sample with each codec and compare producer-side metrics: compression-rate-avg (closer to 0 is better compression), request-size-avg, and record-send-rate. Also watch producer CPU. Often the ratio difference between zstd and lz4 is small while the CPU difference is meaningful, so the “best” choice is workload-specific.

Best Practices

Default to zstd for text/JSON payloads; switch to lz4 when latency and CPU headroom are tighter than storage budget.
Never compress already-compressed data (images, protobuf, parquet) — use none to avoid wasting CPU for no gain.
Increase batch.size and add a small linger.ms so batches are large enough for the codec to find redundancy.
Leave broker/topic compression.type at producer to avoid expensive server-side recompression.
Validate codec support across all producers, brokers, and consumers before rolling out zstd in a mixed-version cluster.
Treat compression as one knob in your throughput tuning — measure compression-rate-avg and producer CPU together rather than maximizing ratio blindly.