Exactly-Once Semantics (EOS)
Exactly-once semantics (EOS) is Kafka’s strongest delivery guarantee: every record is processed and its effects applied to Kafka once and only once, even when producers retry, brokers fail over, or consumers rebalance. It is not magic and it is not free — EOS is the combination of three cooperating mechanisms: the idempotent producer, transactions, and read_committed consumers. In production, EOS is what lets a stream-processing job consume from one topic, transform, and write to another without ever double-counting or dropping a message, which is essential for anything that touches money, inventory, or aggregates.
The two failure modes EOS eliminates
At-least-once delivery (the default with acks=all and retries) protects you from data loss but allows duplicates: a producer that times out waiting for an ack will resend, and the broker may have already committed the first copy. At-most-once avoids duplicates but risks loss. EOS removes both by giving each write a deduplication identity and by making the read-process-write cycle atomic.
| Concern | At-least-once | Exactly-once |
|---|---|---|
| Duplicate writes on producer retry | Possible | Prevented (idempotence) |
| Partial output on crash mid-batch | Possible | Prevented (transactions) |
| Consumer sees uncommitted/aborted data | Yes | No (read_committed) |
| Throughput cost | Baseline | ~3-20% overhead |
Layer 1: the idempotent producer
The idempotent producer guarantees that retries of the same record never create duplicates within a single producer session, per partition. The broker assigns the producer a Producer ID (PID) and tracks a monotonically increasing sequence number per partition; a resent record carrying an already-seen sequence number is silently deduplicated.
Enable it with one flag, which Kafka turns on automatically when you set the supporting configs:
enable.idempotence=true
acks=all
retries=2147483647
max.in.flight.requests.per.connection=5
Idempotence alone covers producer retries only. It does not make a multi-message, multi-partition write atomic, and it does not survive a process restart — for that you need transactions.
Layer 2: transactions and the read-process-write loop
Transactions let a producer write to multiple partitions atomically and, crucially, commit the consumer offsets in the same transaction via sendOffsetsToTransaction. This makes the classic consume-transform-produce loop exactly-once: either the output records and the input offsets commit together, or neither does.
A stable transactional.id is what ties a producer’s transactional state across restarts so the broker can fence zombie instances.
Properties p = new Properties();
p.put("bootstrap.servers", "broker:9092");
p.put("enable.idempotence", "true");
p.put("acks", "all");
p.put("transactional.id", "order-processor-1");
p.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
p.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
try (KafkaProducer<String, String> producer = new KafkaProducer<>(p);
KafkaConsumer<String, String> consumer = buildConsumer()) {
producer.initTransactions();
consumer.subscribe(List.of("orders"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(500));
if (records.isEmpty()) continue;
producer.beginTransaction();
try {
Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
for (ConsumerRecord<String, String> rec : records) {
producer.send(new ProducerRecord<>("orders-enriched", rec.key(), enrich(rec.value())));
offsets.put(
new TopicPartition(rec.topic(), rec.partition()),
new OffsetAndMetadata(rec.offset() + 1));
}
// Commit offsets *inside* the transaction — this is the atomic link.
producer.sendOffsetsToTransaction(offsets, consumer.groupMetadata());
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
}
}
The consumer in this loop must not auto-commit, because the producer owns offset commits:
enable.auto.commit=false
isolation.level=read_committed
Layer 3: read_committed consumers
A transaction is only half the contract; downstream consumers must agree to ignore uncommitted and aborted data. Setting isolation.level=read_committed makes a consumer skip records from aborted transactions and never read past the last stable offset (the highest offset before any still-open transaction). The default, read_uncommitted, sees everything immediately and would resurrect the duplicates EOS is meant to remove.
The Spring Boot way
Spring for Apache Kafka removes the boilerplate. Setting a transaction-id prefix turns on a KafkaTransactionManager, and a @Transactional-style listener container chains consume and produce into one Kafka transaction automatically.
spring:
kafka:
producer:
transaction-id-prefix: tx-order-
acks: all
consumer:
isolation-level: read_committed
enable-auto-commit: false
public record OrderEnriched(String id, String region, long total) {}
@Component
class OrderProcessor {
private final KafkaTemplate<String, OrderEnriched> template;
OrderProcessor(KafkaTemplate<String, OrderEnriched> template) {
this.template = template;
}
@KafkaListener(topics = "orders", groupId = "order-processor")
void process(Order order) {
// Runs inside a Kafka transaction; the listener's offset commit
// and this send commit atomically, or both roll back.
template.send("orders-enriched", order.id(), enrich(order));
}
}
What it costs and when it is justified
EOS adds latency from transaction markers, the two-phase commit to the transaction coordinator, and the read-committed buffering on consumers. Typical end-to-end overhead is single-digit to low-double-digit percent, larger when transactions are tiny (batch more records per transaction to amortize). It is justified when duplicates are correctness bugs — billing, ledger aggregation, deduplicated joins, idempotent materialized views. It is overkill for fire-and-forget logs or metrics where at-least-once plus a downstream dedup key is cheaper.
Gotcha: EOS is only end-to-end if every hop honors it. A
read_uncommittedconsumer or a sink that writes to an external system without its own idempotency key breaks the guarantee at that boundary.
Best Practices
- Set a stable, unique
transactional.idper logical producer instance so the coordinator can fence zombies after restarts. - Always pair transactional producers with
isolation.level=read_committedconsumers — otherwise downstream readers see aborted data. - Disable
enable.auto.commitin the loop; letsendOffsetsToTransactioncommit offsets atomically with the output. - Batch multiple records per transaction to amortize commit overhead, but keep transactions short enough to avoid blocking the last stable offset for
read_committedreaders. - Tune
transaction.timeout.msbelow the broker’stransaction.max.timeout.ms, and keep it longer than your worst-case processing time. - For pure stream processing, prefer Kafka Streams with
processing.guarantee=exactly_once_v2, which manages all of this for you. - Remember EOS guarantees apply to Kafka; external side effects still need their own idempotency.