Skip to content
Apache Kafka kf admin-ops 5 min read

Consumer Group Management

Consumer groups are how Kafka scales and load-balances consumption, and keeping them healthy is a daily operational concern. The kafka-consumer-groups.sh tool is the canonical way to list groups, inspect their members and partition assignments, measure how far behind they are running (lag), and surgically rewind or fast-forward their committed offsets. Used correctly it is indispensable for incident response and replays; used carelessly on a live group it can silently drop or reprocess production data. This page walks through the safe, production-tested workflows.

Listing consumer groups

The first step is discovery: which groups exist on the cluster? Every command points at the cluster with --bootstrap-server (KRaft clusters need no ZooKeeper flag).

kafka-consumer-groups.sh \
  --bootstrap-server broker1:9092 \
  --list

Output:

order-processing
payment-service
analytics-etl
console-consumer-48213

Names like console-consumer-* are auto-generated by ad-hoc CLI consumers. Stable application groups should always set an explicit group.id.

Describing a group

--describe is where you spend most of your time. By default it prints, per partition, the current committed offset, the log-end offset, and the resulting lag, along with the consumer instance and host that owns each partition.

kafka-consumer-groups.sh \
  --bootstrap-server broker1:9092 \
  --group order-processing \
  --describe

Output:

GROUP            TOPIC   PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG   CONSUMER-ID                                     HOST            CLIENT-ID
order-processing orders  0          150234          150234          0     consumer-order-processing-1-a1b2c3              /10.0.1.21      consumer-1
order-processing orders  1          148990          149512          522   consumer-order-processing-2-d4e5f6              /10.0.1.22      consumer-2
order-processing orders  2          151002          151002          0     consumer-order-processing-3-g7h8i9              /10.0.1.23      consumer-3

LAG is the single most important operational metric here: LOG-END-OFFSET - CURRENT-OFFSET. A small, stable lag is normal; lag that grows monotonically means your consumers cannot keep up with producers. A - in the CONSUMER-ID column means a partition is currently unassigned (no live member), which often points at a rebalance in progress or insufficient consumer instances.

You can drill into specific views with sub-flags:

FlagWhat it shows
--describe (default)Offsets, lag, and the owning member per partition
--describe --membersOne row per member with the count of partitions it owns
--describe --members --verboseMembers plus the exact partition list each one is assigned
--describe --offsetsExplicit offset/lag view (same as default --describe)
--describe --stateCoordinator, assignment strategy, group state, and member count
kafka-consumer-groups.sh --bootstrap-server broker1:9092 \
  --group order-processing --describe --state

Output:

GROUP            COORDINATOR (ID)  ASSIGNMENT-STRATEGY  STATE   #MEMBERS
order-processing broker2:9092 (2)  cooperative-sticky   Stable  3

A STATE of Stable is healthy; PreparingRebalance or CompletingRebalance for an extended period indicates churn (often from session timeouts or repeated deploys).

Resetting offsets

--reset-offsets rewrites the committed offsets for a group. This is the mechanism behind replaying a topic from the start, skipping past poison messages, or recovering after a bug. It targets either an entire topic (--topic orders) or specific partitions (--topic orders:0,1), and you choose a reset position with exactly one scope flag:

Reset scopeEffect
--to-earliestRewind to the oldest retained offset (full replay)
--to-latestSkip to the end; consumer ignores the backlog
--to-offset <n>Set to an absolute offset
--shift-by <n>Move relative to current (negative rewinds)
--to-datetime <ISO8601>First offset at or after a timestamp
--by-duration <ISO8601 duration>Rewind by a duration, e.g. PT1H

Warning: Reset is a destructive metadata operation. It does not delete messages, but it changes what the group will read next, which can cause large-scale reprocessing or data skipping. Always run with --dry-run first and capture the planned offsets before committing.

The two-step workflow is mandatory in production. Start with --dry-run, which only prints the offsets it would set:

kafka-consumer-groups.sh \
  --bootstrap-server broker1:9092 \
  --group analytics-etl \
  --topic events \
  --reset-offsets --to-datetime 2026-06-01T00:00:00.000 \
  --dry-run

Output:

GROUP          TOPIC   PARTITION  NEW-OFFSET
analytics-etl  events  0          98230
analytics-etl  events  1          97511
analytics-etl  events  2          99002

Once the plan looks correct, replace --dry-run with --execute to commit it:

kafka-consumer-groups.sh \
  --bootstrap-server broker1:9092 \
  --group analytics-etl \
  --topic events \
  --reset-offsets --to-earliest \
  --execute

Resetting a live group will fail (by design)

Kafka refuses to reset offsets for a group that has active members, because changing offsets under a running consumer would race against its own commits and produce undefined behavior.

Output:

Error: Assignments can only be reset if the group 'analytics-etl' is inactive,
but the current state is Stable.

The safe procedure is to stop every consumer instance in the group, confirm the group is Empty via --describe --state, then run --execute, and only then restart the consumers.

Programmatic management with AdminClient

For automation and dashboards, prefer the Java Admin API over shelling out to the script. It exposes the same data through listConsumerGroups, describeConsumerGroups, and listConsumerGroupOffsets.

import org.apache.kafka.clients.admin.Admin;
import org.apache.kafka.clients.admin.AdminClientConfig;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;

import java.util.Map;
import java.util.Properties;

public final class GroupLagInspector {

    public static void main(String[] args) throws Exception {
        Properties props = new Properties();
        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");

        try (Admin admin = Admin.create(props)) {
            Map<TopicPartition, OffsetAndMetadata> committed = admin
                .listConsumerGroupOffsets("order-processing")
                .partitionsToOffsetAndMetadata()
                .get();

            committed.forEach((tp, offset) ->
                System.out.printf("%s-%d committed at %d%n",
                    tp.topic(), tp.partition(), offset.offset()));
        }
    }
}

Pairing committed offsets with end offsets from KafkaConsumer.endOffsets(...) gives you the same lag figure the CLI reports, which you can then export to your monitoring system.

Best Practices

  • Always set an explicit, stable group.id per application so groups are identifiable and offsets survive restarts.
  • Treat lag as a first-class SLO: alert on sustained or growing lag rather than absolute values, since healthy systems carry some lag.
  • Never reset offsets on a running group; stop consumers, verify the state is Empty, reset, then restart.
  • Run every reset with --dry-run and archive the printed offsets before --execute, so you can roll back.
  • Prefer --to-datetime or --by-duration over raw offsets when replaying, as they are far less error-prone across partitions.
  • Use the Admin API for recurring inspection and CI checks; reserve the shell script for interactive and break-glass operations.
  • Watch for PreparingRebalance/CompletingRebalance states lingering, which usually signal session timeouts or noisy deploys.
Last updated June 1, 2026
Was this helpful?