KStream, KTable & GlobalKTable

Every Kafka Streams topology is built on three abstractions over the same underlying topic data: KStream, KTable, and GlobalKTable. Choosing the right one is the single most important modelling decision you make, because it determines whether each record is treated as an independent fact or as an update to a keyed entity. Picking the wrong abstraction leads to duplicated processing, incorrect aggregates, or join failures in production. This page explains each abstraction, the stream–table duality that connects them, and exactly when to reach for each.

KStream: an unbounded stream of events

A KStream is an unbounded, ordered sequence of immutable records. Each record is a self-contained fact — a click, a payment, a sensor reading. Two records with the same key are two separate events; nothing is overwritten. Think of it as an append-only log where every line matters.

StreamsBuilder builder = new StreamsBuilder();

KStream<String, String> clicks =
    builder.stream("user-clicks", Consumed.with(Serdes.String(), Serdes.String()));

clicks
    .filter((userId, url) -> url.startsWith("/checkout"))
    .mapValues(url -> "CHECKOUT_VIEW")
    .to("checkout-events", Produced.with(Serdes.String(), Serdes.String()));

If the same userId clicks checkout three times, you process three records and emit three events. That is exactly what you want for event streams, metrics, and audit logs.

KTable: a changelog view of the latest value per key

A KTable interprets the same topic as a changelog. Each record is an upsert: the latest record per key wins, and a record with a null value is a tombstone that deletes the key. The table therefore represents the current state of each entity — the materialized “now” of your data.

KTable<String, String> userProfiles =
    builder.table(
        "user-profiles",
        Consumed.with(Serdes.String(), Serdes.String()),
        Materialized.as("user-profiles-store"));

If user-42 is written three times, the KTable reflects only the last value. Downstream operators see the change in state, not every individual write. This makes KTable the right model for entity state: customer records, account balances, configuration, inventory counts.

A KTable is partitioned exactly like the source topic. To join against it on key, the other side must share the same key and partition count so co-partitioned records land on the same task.

GlobalKTable: fully replicated reference data

A GlobalKTable is a KTable whose entire contents are replicated to every application instance, regardless of partitioning. Each instance reads all partitions of the source topic into a local store. This removes the co-partitioning requirement: you can join a KStream against a GlobalKTable on any attribute of the stream record, not just its key.

GlobalKTable<String, String> countryCodes =
    builder.globalTable(
        "country-reference",
        Consumed.with(Serdes.String(), Serdes.String()),
        Materialized.as("country-ref-store"));

KStream<String, Order> orders =
    builder.stream("orders", Consumed.with(Serdes.String(), orderSerde));

orders.join(
    countryCodes,
    (orderKey, order) -> order.countryCode(),   // key selector picks ANY field
    (order, countryName) -> order.withCountry(countryName))
  .to("enriched-orders");

The trade-off is memory and startup cost: every instance holds a full copy. Use GlobalKTable for small, slowly changing reference data — currency tables, feature flags, country codes — not for large transactional datasets.

The stream–table duality

The key insight behind Kafka Streams is that a stream and a table are two views of the same data. A table is the aggregation of a stream of changes; a stream is the sequence of changes that produced a table. You convert freely between them.

   stream of changes              materialized table
  ------------------             -------------------
  (k1, v1)  ---->  apply  ---->  k1 -> v1
  (k2, v9)  ---->  apply  ---->  k2 -> v9
  (k1, v5)  ---->  apply  ---->  k1 -> v5   (v1 overwritten)
  (k2, null)---->  apply  ---->  k2 deleted (tombstone)

// table -> stream: emit one record per change
KStream<String, String> changes = userProfiles.toStream();

// stream -> table: collapse to latest value per key via aggregation
KTable<String, Long> counts = clicks
    .groupByKey()
    .count(Materialized.as("click-counts"));

groupByKey().count() turns an event stream into a continuously updated table, and toStream() turns a table back into its changelog. This duality is what lets you mix event processing and stateful aggregation in one topology.

Choosing the right abstraction

Abstraction	Semantics	Partitioning	Typical use
`KStream`	Every record is an independent event	Per source topic	Clicks, payments, logs, metrics
`KTable`	Latest value per key (upsert + tombstone)	Per source topic; co-partitioned for joins	Entity state, balances, profiles
`GlobalKTable`	Latest value per key, replicated to all instances	Fully replicated	Small reference / lookup data

Reading a compacted topic as a KStream is a common bug: you will reprocess stale, superseded values. Compacted entity topics almost always belong in a KTable or GlobalKTable.

Best Practices

Model events (facts you never overwrite) as KStream, and entity state (latest value per key) as KTable.
Use GlobalKTable only for small, slow-moving reference data; its full replication cost grows with every instance and every topic byte.
Ensure KStream–KTable joins are co-partitioned (same key, same partition count); reach for GlobalKTable when you must join on a non-key field instead.
Always supply explicit Serdes via Consumed, Produced, and Materialized rather than relying on global defaults — it makes schemas obvious and prevents serialization surprises.
Name your state stores (Materialized.as(...)) so changelog topics, metrics, and interactive queries are stable across restarts and redeploys.
Use tombstones (null values) deliberately to delete keys from a KTable; never publish them to a topic you also consume as a KStream.