Exactly-Once Semantics (EOS)

Exactly-once semantics (EOS) is Kafka’s strongest delivery guarantee: every record is processed and its effects applied to Kafka once and only once, even when producers retry, brokers fail over, or consumers rebalance. It is not magic and it is not free — EOS is the combination of three cooperating mechanisms: the idempotent producer, transactions, and read_committed consumers. In production, EOS is what lets a stream-processing job consume from one topic, transform, and write to another without ever double-counting or dropping a message, which is essential for anything that touches money, inventory, or aggregates.

The two failure modes EOS eliminates

At-least-once delivery (the default with acks=all and retries) protects you from data loss but allows duplicates: a producer that times out waiting for an ack will resend, and the broker may have already committed the first copy. At-most-once avoids duplicates but risks loss. EOS removes both by giving each write a deduplication identity and by making the read-process-write cycle atomic.

Concern	At-least-once	Exactly-once
Duplicate writes on producer retry	Possible	Prevented (idempotence)
Partial output on crash mid-batch	Possible	Prevented (transactions)
Consumer sees uncommitted/aborted data	Yes	No (`read_committed`)
Throughput cost	Baseline	~3-20% overhead

Layer 1: the idempotent producer

The idempotent producer guarantees that retries of the same record never create duplicates within a single producer session, per partition. The broker assigns the producer a Producer ID (PID) and tracks a monotonically increasing sequence number per partition; a resent record carrying an already-seen sequence number is silently deduplicated.

Enable it with one flag, which Kafka turns on automatically when you set the supporting configs:

enable.idempotence=true
acks=all
retries=2147483647
max.in.flight.requests.per.connection=5

Idempotence alone covers producer retries only. It does not make a multi-message, multi-partition write atomic, and it does not survive a process restart — for that you need transactions.

Layer 2: transactions and the read-process-write loop

Transactions let a producer write to multiple partitions atomically and, crucially, commit the consumer offsets in the same transaction via sendOffsetsToTransaction. This makes the classic consume-transform-produce loop exactly-once: either the output records and the input offsets commit together, or neither does.

A stable transactional.id is what ties a producer’s transactional state across restarts so the broker can fence zombie instances.

Properties p = new Properties();
p.put("bootstrap.servers", "broker:9092");
p.put("enable.idempotence", "true");
p.put("acks", "all");
p.put("transactional.id", "order-processor-1");
p.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
p.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

try (KafkaProducer<String, String> producer = new KafkaProducer<>(p);
     KafkaConsumer<String, String> consumer = buildConsumer()) {

    producer.initTransactions();
    consumer.subscribe(List.of("orders"));

    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(500));
        if (records.isEmpty()) continue;

        producer.beginTransaction();
        try {
            Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
            for (ConsumerRecord<String, String> rec : records) {
                producer.send(new ProducerRecord<>("orders-enriched", rec.key(), enrich(rec.value())));
                offsets.put(
                    new TopicPartition(rec.topic(), rec.partition()),
                    new OffsetAndMetadata(rec.offset() + 1));
            }
            // Commit offsets *inside* the transaction — this is the atomic link.
            producer.sendOffsetsToTransaction(offsets, consumer.groupMetadata());
            producer.commitTransaction();
        } catch (KafkaException e) {
            producer.abortTransaction();
        }
    }
}

The consumer in this loop must not auto-commit, because the producer owns offset commits:

enable.auto.commit=false
isolation.level=read_committed

Layer 3: read_committed consumers

A transaction is only half the contract; downstream consumers must agree to ignore uncommitted and aborted data. Setting isolation.level=read_committed makes a consumer skip records from aborted transactions and never read past the last stable offset (the highest offset before any still-open transaction). The default, read_uncommitted, sees everything immediately and would resurrect the duplicates EOS is meant to remove.

The Spring Boot way

Spring for Apache Kafka removes the boilerplate. Setting a transaction-id prefix turns on a KafkaTransactionManager, and a @Transactional-style listener container chains consume and produce into one Kafka transaction automatically.

spring:
  kafka:
    producer:
      transaction-id-prefix: tx-order-
      acks: all
    consumer:
      isolation-level: read_committed
      enable-auto-commit: false

public record OrderEnriched(String id, String region, long total) {}

@Component
class OrderProcessor {

    private final KafkaTemplate<String, OrderEnriched> template;

    OrderProcessor(KafkaTemplate<String, OrderEnriched> template) {
        this.template = template;
    }

    @KafkaListener(topics = "orders", groupId = "order-processor")
    void process(Order order) {
        // Runs inside a Kafka transaction; the listener's offset commit
        // and this send commit atomically, or both roll back.
        template.send("orders-enriched", order.id(), enrich(order));
    }
}

What it costs and when it is justified

EOS adds latency from transaction markers, the two-phase commit to the transaction coordinator, and the read-committed buffering on consumers. Typical end-to-end overhead is single-digit to low-double-digit percent, larger when transactions are tiny (batch more records per transaction to amortize). It is justified when duplicates are correctness bugs — billing, ledger aggregation, deduplicated joins, idempotent materialized views. It is overkill for fire-and-forget logs or metrics where at-least-once plus a downstream dedup key is cheaper.

Gotcha: EOS is only end-to-end if every hop honors it. A read_uncommitted consumer or a sink that writes to an external system without its own idempotency key breaks the guarantee at that boundary.

Best Practices

Set a stable, unique transactional.id per logical producer instance so the coordinator can fence zombies after restarts.
Always pair transactional producers with isolation.level=read_committed consumers — otherwise downstream readers see aborted data.
Disable enable.auto.commit in the loop; let sendOffsetsToTransaction commit offsets atomically with the output.
Batch multiple records per transaction to amortize commit overhead, but keep transactions short enough to avoid blocking the last stable offset for read_committed readers.
Tune transaction.timeout.ms below the broker’s transaction.max.timeout.ms, and keep it longer than your worst-case processing time.
For pure stream processing, prefer Kafka Streams with processing.guarantee=exactly_once_v2, which manages all of this for you.
Remember EOS guarantees apply to Kafka; external side effects still need their own idempotency.