Static Group Membership
Every time a consumer leaves a group, Kafka triggers a rebalance to redistribute partitions among the survivors. During normal operations like rolling restarts or brief network blips this is wasteful: the consumer comes back almost immediately, yet the group has already paid the cost of stopping all members, reassigning partitions, and resuming. Static group membership, introduced in Kafka 2.3 (KIP-345), lets a consumer keep a stable identity across restarts so that short absences no longer cause a rebalance. In production this dramatically reduces consumption pauses and stabilises latency-sensitive pipelines.
How dynamic membership behaves
In the default dynamic model, the group coordinator assigns each consumer an ephemeral member ID when it joins. When that consumer shuts down, it sends a LeaveGroup request (or simply stops sending heartbeats), and the coordinator immediately removes it and rebalances. On restart the consumer is treated as a brand-new member with a fresh ID, triggering a second rebalance when it rejoins.
A rolling restart of a 10-instance deployment can therefore cause up to 20 rebalances. Each rebalance is a stop-the-world event for the group: every consumer revokes its partitions and waits until assignment completes before processing resumes.
How static membership works
Static membership assigns each consumer a durable identity through the group.instance.id configuration. The coordinator maps this stable ID to the member’s partition assignment and persists it across reconnections for the duration of session.timeout.ms.
Two behaviours change as a result:
- Graceful shutdown does not send
LeaveGroup. When a static member stops, the coordinator keeps its slot reserved instead of rebalancing. - Rejoining reuses the previous assignment. As long as the member returns before
session.timeout.msexpires, it reclaims its exact previous partitions with no rebalance.
If the member stays away longer than the session timeout, the coordinator finally expires it and rebalances normally — so static membership defers, but does not eliminate, rebalancing for genuine failures.
Configuration
Each consumer instance must have a unique, stable group.instance.id. Reusing the same value on two live members causes a fatal FencedInstanceIdException. Pair it with a session.timeout.ms large enough to cover your restart window.
group.id=order-processors
group.instance.id=order-processor-3
session.timeout.ms=45000
heartbeat.interval.ms=10000
max.poll.interval.ms=300000
With the plain Java client:
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "order-processors");
props.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, "order-processor-3");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 45000);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
consumer.subscribe(List.of("orders"));
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofMillis(500));
records.forEach(r -> process(r.value()));
}
}
Spring Boot configuration
In Spring for Apache Kafka the property maps directly. The critical requirement is that each pod gets a distinct value — derive it from a stable ordinal such as a Kubernetes StatefulSet pod name.
spring:
kafka:
consumer:
group-id: order-processors
properties:
group.instance.id: ${HOSTNAME} # e.g. order-processor-3 in a StatefulSet
session.timeout.ms: 45000
heartbeat.interval.ms: 10000
@Component
public class OrderListener {
@KafkaListener(topics = "orders", groupId = "order-processors")
public void onMessage(OrderEvent event) {
// process the event; assignment survives a quick pod restart
}
}
public record OrderEvent(String orderId, String status, long amountCents) {}
Tip: Set
session.timeout.mscomfortably above your slowest restart — for container deployments allow for image pull, JVM warm-up, and readiness checks. A common range is 30s–60s. The broker also caps it betweengroup.min.session.timeout.msandgroup.max.session.timeout.ms.
Static vs dynamic membership
| Aspect | Dynamic membership | Static membership |
|---|---|---|
| Identity | Ephemeral member ID per join | Stable group.instance.id |
| Graceful shutdown | Sends LeaveGroup, rebalances | No LeaveGroup, slot reserved |
| Rolling restart | Rebalance per stop and per start | No rebalance if back within session timeout |
| Transient disconnect | Rebalance after session timeout | Same assignment reclaimed |
| Duplicate ID | N/A | FencedInstanceIdException |
| Best for | Elastic, frequently scaling groups | Stable, long-lived consumers |
Operational benefit
The payoff is fewer, shorter consumption stalls. Verify membership type using the consumer-groups CLI; static members show their group.instance.id in the HOST/CLIENT-ID listing.
kafka-consumer-groups.sh --bootstrap-server broker:9092 \
--describe --group order-processors --members --verbose
Output:
GROUP CONSUMER-ID HOST CLIENT-ID #PARTITIONS ASSIGNMENT
order-processors order-processor-3-a1b2c3... /10.0.0.13 ... 4 orders(0,1,2,3)
Restart that pod and re-run the command: the assignment column stays identical, and group lag barely moves instead of spiking during a rebalance.
Best practices
- Derive
group.instance.idfrom a stable, unique source (StatefulSet pod name, ordinal index) — never a random UUID, which would defeat the purpose. - Size
session.timeout.msto exceed your realistic restart duration, but not so high that real failures go undetected for too long. - Keep
heartbeat.interval.msat roughly one third of the session timeout so liveness is still detected promptly. - Combine static membership with cooperative rebalancing for the smoothest behaviour when a genuine rebalance is unavoidable.
- Watch for
FencedInstanceIdExceptionin logs — it means two live instances share an ID; fix your ID assignment immediately. - Remember that scaling the group up or down still rebalances; static membership only suppresses rebalances for transient absences.