auto.offset.reset Explained

When a Kafka consumer starts reading a partition, it needs to know where to begin. Usually it resumes from the last committed offset stored in the __consumer_offsets topic. But what happens the very first time a consumer group runs, or when the committed offset has been deleted by retention? That is exactly the gap auto.offset.reset fills. Misunderstanding this single property is one of the most common causes of “my consumer skipped all the messages” and “my consumer reprocessed everything” incidents in production.

When does auto.offset.reset actually apply?

This is the part most people get wrong: auto.offset.reset is not consulted on every poll. It only kicks in when the consumer has no valid offset position for a partition. That happens in two situations:

The consumer group is reading a partition for the first time and has never committed an offset for it.
The previously committed offset is out of range — typically because the data it pointed at has already been deleted by the broker’s retention policy, leaving a gap between the last committed offset and the earliest available message.

If a valid committed offset exists, Kafka resumes from it and ignores auto.offset.reset entirely. Changing the property on an existing, actively-committing group will have no visible effect until one of the two conditions above occurs.

Setting auto.offset.reset and expecting an existing group to “start over” does nothing. To replay, you must reset offsets explicitly with kafka-consumer-groups.sh --reset-offsets or programmatic seek().

The three values

Value	Behaviour when no valid offset	Typical use
`earliest`	Start from the beginning of the partition (oldest retained message).	Reprocessing, completeness-critical pipelines, ETL, replay.
`latest` (default)	Start from the end — only messages produced after the consumer joins.	Live dashboards, alerting, “now onwards” stream processing.
`none`	Throw `NoOffsetForPartitionException` and fail to start.	Strict pipelines that must never guess a starting point.

earliest — completeness and reprocessing

Choose earliest when missing data is unacceptable. A fresh consumer group will read every message still retained in the topic. This is ideal for analytics ingestion, building materialized views, audit trails, or any job where you would rather process a record twice than skip it.

spring:
  kafka:
    consumer:
      group-id: orders-etl
      auto-offset-reset: earliest
      enable-auto-commit: false

@Component
public class OrdersEtlConsumer {

    @KafkaListener(topics = "orders", groupId = "orders-etl")
    public void consume(ConsumerRecord<String, String> record, Acknowledgment ack) {
        // First run with no committed offset -> starts at the oldest retained message
        process(record.value());
        ack.acknowledge();
    }

    private void process(String payload) { /* persist to warehouse */ }
}

Be careful: a brand-new group-id plus earliest on a topic with weeks of data means your consumer will replay the entire backlog on first start. Size your consumers and downstream systems for that initial surge.

latest — live-only consumption

latest is the default and suits use cases where only current events matter. A monitoring service that alerts on errors does not need yesterday’s logs; it cares about what is happening now. With latest, a new group ignores all existing history and begins at the high-water mark.

Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "live-alerts");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);

try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
    consumer.subscribe(List.of("metrics"));
    while (true) {
        var records = consumer.poll(Duration.ofMillis(500));
        records.forEach(r -> raiseAlertIfNeeded(r.value()));
    }
}

The hidden danger with latest is the gap window. If a consumer crashes and its committed offsets age out of retention before it restarts, the reset to latest will silently skip everything produced during the downtime — data loss with no error.

none — fail fast

Use none when a wrong guess is worse than an outage. With none, if the consumer cannot find a valid committed offset, it throws NoOffsetForPartitionException instead of silently choosing a position. This forces an operator to make an explicit decision: seed offsets deliberately, or seek() to a known point.

props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "none");

try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
    consumer.subscribe(List.of("payments"));
    try {
        consumer.poll(Duration.ofMillis(500));
    } catch (NoOffsetForPartitionException e) {
        // No committed offset: decide explicitly instead of guessing
        consumer.seekToBeginning(e.partitions());
    }
}

Output:

org.apache.kafka.clients.consumer.NoOffsetForPartitionException:
  Undefined offset with no reset policy for partitions: [payments-0, payments-1]

Choosing the right value

Need every record, even historical? Use earliest.
Only care about events from “now”? Use latest.
Cannot tolerate a silent skip or silent replay? Use none and handle the exception explicitly.

Best Practices

Treat auto.offset.reset as a first-start / offset-lost fallback, not a runtime “rewind” switch — use seek() or kafka-consumer-groups.sh --reset-offsets for deliberate replays.
Pair earliest with idempotent processing so the inevitable reprocessing on first run (or after an offset-out-of-range event) does not corrupt downstream state.
For latest, ensure your offset retention (offsets.retention.minutes) comfortably exceeds your worst-case consumer downtime to avoid silently skipping messages.
Prefer none for financial or compliance pipelines where an accidental skip or replay is a serious incident, and wire up alerting on NoOffsetForPartitionException.
Always set group-id deliberately; a typo creates a new group that re-triggers the reset policy and reads from earliest/latest as if brand new.
Document the chosen reset policy per consumer group — the value is invisible during normal operation and only matters during the rare, high-stakes edge cases.