Event Streaming Concepts
Before you write a single producer or consumer, it pays to get the vocabulary right. Kafka is not a message queue with extra features bolted on — it is a distributed, durable, append-only commit log that you read and write with publish/subscribe semantics. Internalising a handful of core terms (event, stream, log, retention, replay) will make every later page — partitions, consumer groups, exactly-once — click into place instead of feeling like trivia. This page sets that conceptual foundation.
What is an event?
An event is an immutable record of something that already happened, captured at a point in time. “User 42 added SKU-9 to their cart at 14:03:22Z” is an event. Note the past tense: events describe facts, not requests. Once an event has occurred, it cannot be un-happened — which is why Kafka stores events as immutable records you never edit or delete in place.
In Kafka, an event is a key/value pair plus a timestamp and optional headers. Modelled as a Java record, an event is just data:
public record OrderPlaced(
String orderId,
String customerId,
long amountCents,
Instant occurredAt
) {}
The key (often a business identifier like customerId) controls ordering and partition placement; the value carries the payload; headers carry metadata such as tracing or schema info.
Events vs. commands vs. messages
These three words are used loosely in conversation but mean different things, and mixing them up leads to bad designs.
| Term | Tense | Intent | Coupling | Example |
|---|---|---|---|---|
| Command | Imperative | ”Do this” | Sender expects a specific handler to act | PlaceOrder |
| Event | Past | ”This happened” | Producer doesn’t know or care who reacts | OrderPlaced |
| Message | Neutral | Transport-level envelope | The container, not the meaning | Any Kafka record |
A message is the generic transport unit — the bytes on the wire. An event is a message whose semantics are “a fact occurred.” A command is a message whose semantics are “please perform an action.” Kafka can carry all three, but its sweet spot is events: the producer fires a fact and any number of independent consumers decide what to do with it.
Prefer events over commands when integrating services. Commands create point-to-point coupling; events let you add new consumers later without touching the producer — the core benefit of event-driven architecture.
The commit log
The single most important idea in Kafka is the immutable, append-only commit log. A log is an ordered sequence of records. New records are only ever appended to the end, and each gets a monotonically increasing offset — its position in the log.
offset: 0 1 2 3 4 5 <- append here
+------+------+------+------+------+------+
records | e0 | e1 | e2 | e3 | e4 | ... |
+------+------+------+------+------+------+
^reader A (offset 2)
^reader B (offset 4)
Because records are never mutated, the log is a perfect, replayable history. Multiple independent readers track their own position, so a slow analytics consumer and a fast notification consumer read the same log without interfering. This decoupling of writes from reads — and of one reader from another — is what makes Kafka scale.
Streams
An event stream is an unbounded, continuously updated sequence of events — the log viewed as something that never ends. Where a database table is a snapshot of current state, a stream is the full sequence of changes that produced that state. The two are dual: you can replay a stream to rebuild a table, and you can capture every table change as a stream (this duality powers change-data-capture and Kafka Streams’ KTable).
Publish/subscribe
Kafka uses a publish/subscribe model. Producers publish events to a named topic; consumers subscribe to topics and read events at their own pace. Producers and consumers never reference each other directly — the topic is the contract.
Topic: "orders"
Producer ---> [ e0 | e1 | e2 | e3 | ... ] ---> Consumer group "billing"
\--> Consumer group "shipping"
\--> Consumer group "analytics"
Critically, reading an event does not consume or remove it. Every subscriber sees every event. Compare that with a traditional queue:
| Traditional queue | Kafka topic | |
|---|---|---|
| Read semantics | Message removed on read | Event stays; offset advances |
| Consumers per message | Usually one | Any number, independently |
| History | Gone after delivery | Retained and replayable |
Retention and replay
Because events are not deleted when read, Kafka keeps them according to a configurable retention policy — by time, by size, or indefinitely. A topic might keep seven days of events; another might keep them forever via log compaction (retaining the latest value per key).
kafka-topics.sh --create --topic orders \
--bootstrap-server localhost:9092 \
--partitions 3 --replication-factor 3 \
--config retention.ms=604800000
Output:
Created topic orders.
Retention enables replay: a consumer can reset its offset to the beginning (or any point) and reprocess history. This is transformative — you can deploy a new service and have it build state from months of past events, or fix a bug and replay to recompute results.
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group analytics --topic orders \
--reset-offsets --to-earliest --execute
Retention is a deliberate design decision, not just disk hygiene. Setting it too short throws away your ability to replay and onboard new consumers; too long needlessly grows storage. Choose it based on your replay requirements.
Best Practices
- Model your domain in past-tense events (
OrderPlaced, notCreateOrder) to keep producers and consumers decoupled. - Treat the log as the source of truth; derive read models, caches, and tables from it rather than the reverse.
- Set the event key intentionally — it drives partitioning and per-key ordering, which you cannot change after the fact without reprocessing.
- Use records and an explicit schema (Avro, Protobuf, or JSON Schema) so the event contract survives producer/consumer version skew.
- Pick retention based on your worst-case replay window, and use log compaction for keyed, “latest-state” topics.
- Never assume reading removes an event — design consumers to be idempotent, because replay and retries will deliver events more than once.