Producer Overview
The Kafka producer is the client responsible for publishing records to topics, and understanding its internal pipeline is the difference between a system that quietly drops data and one that delivers millions of messages per second reliably. A send() call is not a synchronous network round-trip; it is an enqueue into a background machine that serializes, partitions, batches, and ships records on your behalf. Knowing how that machine works lets you reason about latency, throughput, ordering, and durability instead of guessing. This page walks the full path a record takes from your code to a broker acknowledgement.
The send pipeline
When you call producer.send(record), the record flows through a fixed sequence of stages before it ever touches the network:
ProducerRecord
|
v
[ Serializer ] key + value -> byte[]
|
v
[ Partitioner ] choose target partition
|
v
[ Record Accumulator ] append to per-partition batch (in memory)
|
v
[ Sender (I/O) thread ] drain ready batches, group by broker
|
v
[ Broker ] append to log, replicate, return ack
|
v
[ Callback / Future ] RecordMetadata or exception
The key insight is that two threads are involved. Your application thread performs serialization and partitioning, then appends the record to an in-memory buffer and returns immediately. A separate background sender thread (the I/O thread) drains that buffer and talks to the brokers. This decoupling is what makes the producer fast: your code is never blocked on broker latency unless the buffer fills up.
ProducerRecord
A ProducerRecord is the unit you publish. At minimum it carries a topic and a value, but it can also carry a key, an explicit partition, a timestamp, and headers.
| Field | Required | Purpose |
|---|---|---|
topic | Yes | Destination topic name |
value | Yes (may be null for tombstones) | The payload |
key | No | Drives default partitioning and log compaction |
partition | No | Forces a specific partition, bypassing the partitioner |
timestamp | No | Event time; defaults to send time |
headers | No | Arbitrary metadata key/value pairs |
The key matters more than it first appears: records with the same key are routed to the same partition by the default partitioner, which is how Kafka preserves per-key ordering.
Serialization and partitioning
Before a record can be batched it must become bytes. The producer applies the configured key.serializer and value.serializer, then hands the serialized record to the partitioner. If you set a partition explicitly it is honored; otherwise the default partitioner hashes the key (or, for null keys, uses a sticky round-robin strategy that fills one batch at a time for better batching efficiency).
The record accumulator and sender thread
The record accumulator is a pool of memory (sized by buffer.memory, default 32 MB) organized as a queue of batches per topic-partition. New records append to the current open batch for their partition. A batch becomes eligible to send when it is full (batch.size) or when its linger.ms timer expires. The sender thread continuously drains ready batches, groups them by destination broker into a single request, and writes them to the network. This batching is the single biggest throughput lever in Kafka: fewer, larger requests amortize network and broker overhead across many records.
The producer trades a small amount of latency for throughput. Raising
linger.mslets more records accumulate per batch, increasing throughput at the cost of a few extra milliseconds of delay.
A basic producer
The following plain-client example sends three records asynchronously with a callback and then closes cleanly.
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
public class OrderProducer {
public static void main(String[] args) {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.ACKS_CONFIG, "all");
try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
for (int i = 1; i <= 3; i++) {
var record = new ProducerRecord<>("orders", "order-" + i, "amount=" + (i * 100));
producer.send(record, (metadata, exception) -> {
if (exception != null) {
System.err.println("Send failed: " + exception.getMessage());
} else {
System.out.printf("Sent to %s-%d @ offset %d%n",
metadata.topic(), metadata.partition(), metadata.offset());
}
});
}
// try-with-resources calls close(), which flushes pending batches
}
}
}
Output:
Sent to orders-0 @ offset 14
Sent to orders-2 @ offset 9
Sent to orders-1 @ offset 22
Notice the callbacks may fire out of order across partitions, since each partition is acknowledged independently. The RecordMetadata returned in the callback tells you exactly where the record landed, which is invaluable for auditing and debugging.
Acknowledgements
The acks setting controls when a send is considered successful. It is the primary knob for the durability-versus-latency trade-off.
acks | Waits for | Durability | Latency |
|---|---|---|---|
0 | Nothing (fire and forget) | Lowest — may lose data | Lowest |
1 | Leader write only | Moderate — loses data if leader fails before replication | Low |
all | Leader + all in-sync replicas | Highest | Higher |
For most production systems acks=all combined with a replication factor of at least 3 and min.insync.replicas=2 is the durable baseline. Only the acknowledgement of the broker — not the send() return — confirms that a record is safely stored.
Calling
send()and ignoring the returnedFutureor callback is the most common cause of silent data loss. Always handle the result so failed sends surface instead of vanishing.
Best Practices
- Treat
send()as asynchronous: always supply a callback (or check theFuture) so failures are observed rather than swallowed. - Use a meaningful key when per-key ordering matters; records with the same key stay on the same partition.
- Tune
batch.sizeandlinger.mstogether to balance throughput against latency for your workload. - Use
acks=allwithreplication.factor>=3andmin.insync.replicas=2for durable, production-grade delivery. - Reuse a single
KafkaProducerinstance across threads — it is thread-safe and sharing it maximizes batching. - Always close the producer (or use try-with-resources) so buffered batches are flushed before the process exits.