Core Concepts & Glossary
Kafka has a small but dense vocabulary, and every term carries operational weight: misunderstanding what an offset or an ISR actually means is how teams end up with data loss, stuck consumers, or under-replicated partitions in production. This page is a quick-reference glossary of the core concepts you will meet on every other page of these docs. Each term gets one or two precise sentences; skim it now, then bookmark it for the moment a config key or a log line stops making sense.
Cluster and node terms
These describe the physical and logical layout of a Kafka deployment.
| Term | Definition |
|---|---|
| Cluster | A group of one or more cooperating brokers that share the same metadata and together store all topics. |
| Broker | A single Kafka server process that stores partition data on disk and serves produce/fetch requests. Each broker has a unique numeric node.id. |
| Controller | The broker (or dedicated node) that owns cluster metadata — topic creation, partition assignment, leader election. In modern Kafka, controllers form a Raft quorum (KRaft). |
| KRaft | Kafka Raft — the built-in consensus protocol that stores metadata in an internal __cluster_metadata log, replacing ZooKeeper. The default since Kafka 3.3 and the only mode from 4.0 onward. |
| ZooKeeper | The legacy external coordination service Kafka used to store metadata before KRaft. Removed entirely in Kafka 4.0; you should not deploy it for new clusters. |
KRaft vs ZooKeeper is the single biggest architectural shift in Kafka’s history. If you are starting fresh, run KRaft. Only touch ZooKeeper concepts when migrating or maintaining a pre-3.x cluster.
Topic and storage terms
This group covers how records are organized and durably stored.
| Term | Definition |
|---|---|
| Topic | A named, append-only category of records (e.g. orders). Topics are logical; the physical unit is the partition. |
| Partition | An ordered, immutable, append-only log that is the unit of parallelism and ordering. A topic is split into one or more partitions, each identified as topic-N. |
| Offset | A monotonically increasing 64-bit integer that uniquely identifies a record’s position within a partition. Offsets are per-partition, never global. |
| Replica | A copy of a partition stored on a broker. The replication.factor controls how many copies exist; replicas are how Kafka survives broker failure. |
| Leader | The single replica of a partition that handles all reads and writes at a given time. Producers and consumers always talk to the leader. |
| Follower | A replica that passively fetches records from the leader to stay in sync. A follower is promoted to leader if the current leader fails. |
| ISR | In-Sync Replicas — the set of replicas (leader + followers) that are fully caught up with the leader. Only ISR members are eligible to become leader. |
| High watermark | The highest offset that has been replicated to all ISR members. Consumers can only read up to the high watermark, which guarantees they never see unreplicated (potentially lost) data. |
| Retention | The policy that decides when old records are deleted, by time (retention.ms) or size (retention.bytes). |
| Log compaction | A retention mode (cleanup.policy=compact) that keeps only the latest record per key, ideal for changelog/state topics rather than time-bounded event streams. |
You can inspect a partition’s leader and ISR directly:
kafka-topics.sh --bootstrap-server localhost:9092 \
--describe --topic orders
Output:
Topic: orders PartitionCount: 3 ReplicationFactor: 3
Topic: orders Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: orders Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: orders Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 3,1
A partition whose
Isris smaller than itsReplicasis under-replicated — a follower has fallen behind or its broker is down. WatchUnderReplicatedPartitionsas a top-tier alert.
Client terms
These describe the applications that read and write data, and how they coordinate.
| Term | Definition |
|---|---|
| Producer | A client that publishes records to topic partitions. Partition choice is driven by the record key (hash) or a custom partitioner. |
| Consumer | A client that subscribes to topics and reads records in offset order, committing its progress so it can resume after a restart. |
| Consumer group | A set of consumers sharing a group.id that cooperatively divide a topic’s partitions, with each partition consumed by exactly one member at a time. This is how you scale consumption horizontally. |
| Rebalance | The process of reassigning partitions across a consumer group’s members when a consumer joins, leaves, or fails. During a rebalance, consumption briefly pauses. |
| Lag | The difference between the latest offset in a partition and a consumer group’s committed offset — i.e. how far behind a consumer is. The key health metric for any consumer. |
| acks | The producer durability setting: acks=0 (fire-and-forget), acks=1 (leader only), acks=all (all ISR confirmed). Use acks=all whenever you cannot afford to lose records. |
A minimal producer config showing the durability and identity keys above:
var props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.ACKS_CONFIG, "all"); // wait for full ISR
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
try (var producer = new KafkaProducer<String, String>(props)) {
producer.send(new ProducerRecord<>("orders", "order-42", "{\"id\":42}"));
}
To check a consumer group’s lag from the CLI:
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--describe --group order-processor
Best practices
- Treat offsets as per-partition: never assume ordering or numbering carries across partitions of a topic.
- Run
acks=allplusmin.insync.replicas=2withreplication.factor=3for any topic you cannot afford to lose. - Monitor consumer lag and under-replicated partitions as primary SLO signals; both surface problems before users notice.
- Keep rebalances rare and fast by using cooperative-sticky assignment and tuning
session.timeout.ms/max.poll.interval.msto your real workload. - Choose retention vs. compaction deliberately: time/size retention for event streams, compaction for keyed state and changelogs.
- Deploy new clusters on KRaft; do not introduce ZooKeeper into any greenfield system.