Event Sourcing

Event sourcing flips the usual database model on its head: instead of storing the current state of an entity and overwriting it on every change, you store every change as an immutable event. The current state is a derived value — a fold over the event history. Kafka is a natural fit because it is, at its core, a durable, ordered, replayable log. When the log is your system of record, you gain a perfect audit trail, the ability to rebuild any past state, and the freedom to materialize that history into as many read models as you need.

The log as the system of record

In a CRUD system, the row in your accounts table tells you the balance is $150, but it cannot tell you how it got there. In an event-sourced system the balance is never stored directly — it is computed from the ordered facts:

account-42  AccountOpened    { balance: 0 }
account-42  MoneyDeposited   { amount: 200 }
account-42  MoneyWithdrawn   { amount: 50 }
                            └─ replay ─▶ balance = 150

Each event is a verb in the past tense — something that happened and can never be un-happened. To correct a mistake you append a compensating event, never an UPDATE or DELETE. Using the entity id as the Kafka message key guarantees that all events for one aggregate land on the same partition and are therefore strictly ordered, which is essential for correct replay.

Events are facts. Once written they are immutable. If you need to change the meaning of historical data, evolve your schema (e.g. with Avro/Protobuf and a Schema Registry) and version your event types rather than mutating the log.

Writing and replaying events

A producer appends events; a consumer rebuilds state by reading the partition from the beginning. Below, an aggregate is reconstructed by folding the event stream.

public record AccountEvent(String accountId, String type, long amount) {}

public class AccountProjector {

    public long replay(Consumer<String, AccountEvent> consumer, String accountId) {
        long balance = 0;
        for (ConsumerRecord<String, AccountEvent> record : pollAll(consumer)) {
            if (!record.key().equals(accountId)) continue;
            AccountEvent e = record.value();
            balance = switch (e.type()) {
                case "MoneyDeposited" -> balance + e.amount();
                case "MoneyWithdrawn" -> balance - e.amount();
                default -> balance;
            };
        }
        return balance;
    }
}

To replay from the start, seek to the beginning before polling:

consumer.subscribe(List.of("account-events"));
consumer.poll(Duration.ZERO);                 // trigger partition assignment
consumer.seekToBeginning(consumer.assignment());

Snapshots: avoiding endless replay

Replaying millions of events on every startup is expensive. A snapshot is a cached materialization of an aggregate’s state at a known offset. On recovery you load the latest snapshot, seek to snapshot.offset + 1, and replay only the tail.

record Snapshot(String accountId, long balance, long lastOffset) {}

long balance = snapshot.balance();
consumer.seek(partition, snapshot.lastOffset() + 1);
// fold only the events after the snapshot...

Snapshots are an optimization, never the source of truth — you can always rebuild them by replaying the full log if they are lost or corrupted.

Compacted topics for current state

Kafka log compaction keeps at least the most recent value for each key, pruning superseded records. This makes a compacted topic an excellent place to publish current-state projections derived from your event stream: consumers (or a KTable in Kafka Streams) can read the compacted topic and get the latest state per key without replaying full history.

kafka-topics.sh --create --topic account-balances \
  --partitions 6 --replication-factor 3 \
  --config cleanup.policy=compact \
  --config min.cleanable.dirty.ratio=0.1 \
  --bootstrap-server localhost:9092

Concern	Event log topic	Compacted state topic
Cleanup policy	`delete` (long/infinite retention)	`compact`
Holds	Every event, forever	Latest value per key
Purpose	System of record, replay, audit	Fast current-state lookups
Ordering	Strict per partition	Latest-wins per key

Pairing with CQRS

Event sourcing pairs naturally with CQRS (Command Query Responsibility Segregation). Commands produce events to the log (the write side); independent consumers project those events into read-optimized stores — a SQL table for reporting, Elasticsearch for search, a compacted topic for lookups (the read side). Because every read model is just a function of the same log, you can add a brand-new view at any time by replaying history into it.

            ┌─────────────┐   events    ┌──────────────────┐
 Command ─▶ │ Write model │ ──────────▶ │  Kafka event log │
            └─────────────┘             └────────┬─────────┘
                                                 │ replay / stream
                          ┌──────────────────────┼──────────────────────┐
                          ▼                       ▼                       ▼
                   ┌────────────┐         ┌──────────────┐        ┌──────────────┐
                   │ SQL report │         │ Search index │        │ Compacted    │
                   │  (Query)   │         │   (Query)    │        │ state topic  │
                   └────────────┘         └──────────────┘        └──────────────┘

Trade-offs

Benefit	Cost
Complete audit trail and time-travel	Higher conceptual and operational complexity
Rebuild any read model by replaying	Eventual consistency between write and read sides
Natural fit for debugging and analytics	Schema evolution of historical events is hard
Decoupled, independently scalable views	No trivial `UPDATE`/`DELETE`; corrections need events

Event sourcing is not a default — reach for it when auditability, replay, or multiple divergent read models justify the added complexity. For simple CRUD, a database is simpler and correct.

Best Practices

Key every event by aggregate id so all events for one entity stay ordered on a single partition.
Treat the event log as immutable; correct mistakes with compensating events, never edits or deletes.
Use a Schema Registry (Avro/Protobuf) and version your event types so old events stay readable as schemas evolve.
Set long or infinite retention on the source-of-truth topic; never let it expire data you may need to replay.
Snapshot periodically to bound recovery time, but keep snapshots regenerable from the log.
Publish current state to a compacted topic or KTable rather than replaying full history on every query.
Make projections idempotent and track the last processed offset so replays and restarts are safe.