Schema Registry
Kafka itself treats every message value as an opaque array of bytes — it has no idea whether the payload is Avro, Protobuf, JSON, or random noise. Confluent Schema Registry fills that gap by acting as a centralized, versioned store of schemas that producers and consumers can share. Instead of embedding a full schema in every record, clients embed a tiny integer ID, and the registry enforces compatibility rules so that a schema change in one team’s producer never silently breaks another team’s consumer. In production, it is the contract layer that lets independent services evolve their data formats safely.
Subjects and schema IDs
The registry is organized around two core concepts: subjects and schema IDs.
A subject is a named scope under which schemas are versioned. Each time you register a new schema under a subject, it gets a monotonically increasing version number (1, 2, 3, …), and compatibility checks are scoped to that subject. By default, a subject maps to a topic-and-direction pair, e.g. orders-value and orders-key.
A schema ID is a globally unique integer assigned to a distinct schema string. The same schema registered under two different subjects shares one ID; the registry deduplicates. The ID — not the version — is what travels on the wire, because it is globally unique and cheap to look up.
| Concept | Scope | Example | Travels on wire? |
|---|---|---|---|
| Subject | Per topic + key/value | orders-value | No |
| Version | Per subject | 3 | No |
| Schema ID | Global | 42 | Yes |
The wire format
Serializers do not write raw Avro/Protobuf bytes. They prepend a small Confluent wire-format header so consumers know which schema to fetch.
Byte 0 Bytes 1-4 Bytes 5+
+--------+------------------+-----------------------+
| 0x00 | schema ID (int32)| serialized payload |
+--------+------------------+-----------------------+
magic big-endian Avro/Protobuf/JSON bytes
byte 4-byte ID
The first byte is always the magic byte 0x00 (a format version marker). The next four bytes are the big-endian schema ID. Everything after is the actual serialized payload. Protobuf adds a small message-index array after the ID, but the magic-byte-plus-ID prefix is identical across formats. A consumer reads the ID, fetches the matching schema (caching it locally), and deserializes the rest.
If you ever see
Unknown magic byte!in a consumer, a producer wrote raw bytes without going through a Schema-Registry-aware serializer — the first byte was not0x00.
Register / lookup flow
PRODUCE CONSUME
------- -------
Producer Consumer
| 1. serialize record | 4. read magic byte + ID
| (need schema ID) |
v v
Serializer --register schema--> Registry <--lookup ID-- Deserializer
| <----- returns ID 42 ----- ^ |
| | |
v +--cache schema by ID---+
[0x00][42][payload] --> Kafka topic --> [0x00][42][payload]
On produce, the serializer registers (or looks up) the schema under the subject, gets the ID back, caches it, and writes the prefixed bytes. On consume, the deserializer reads the ID, fetches the schema once, caches it, and decodes every subsequent record with that ID for free.
REST API
The registry is driven by a REST API on port 8081. These calls are useful for CI checks and operations even when your apps use the client libraries.
# Register a schema under a subject (returns its global ID)
curl -s -X POST http://localhost:8081/subjects/orders-value/versions \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
-d '{"schema": "{\"type\":\"record\",\"name\":\"Order\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"}]}"}'
# List versions, fetch a specific schema, and check the subject config
curl -s http://localhost:8081/subjects/orders-value/versions
curl -s http://localhost:8081/subjects/orders-value/versions/latest
curl -s http://localhost:8081/config/orders-value
Output:
{"id":42}
[1]
{"subject":"orders-value","version":1,"id":42,"schema":"{\"type\":\"record\",...}"}
{"compatibilityLevel":"BACKWARD"}
You can also dry-run a compatibility check before deploying — a great CI gate:
curl -s -X POST \
http://localhost:8081/compatibility/subjects/orders-value/versions/latest \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
-d '{"schema": "..."}'
# -> {"is_compatible":true}
Subject naming strategies
How a subject name is derived from a record controls how many record types can share a topic and how compatibility is scoped.
| Strategy | Subject format | Use when |
|---|---|---|
TopicNameStrategy (default) | <topic>-key / <topic>-value | One record type per topic |
RecordNameStrategy | <fully.qualified.RecordName> | Multiple record types across topics, scoped per type |
TopicRecordNameStrategy | <topic>-<RecordName> | Multiple record types in one topic, scoped per topic+type |
RecordNameStrategy and TopicRecordNameStrategy let a single topic carry heterogeneous events (e.g. OrderCreated and OrderShipped) while still validating each event against its own schema history.
Producer and consumer config
For plain kafka-clients, point the serializer at the registry URL:
# Producer
key.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer
value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer
schema.registry.url=http://localhost:8081
auto.register.schemas=false
use.latest.version=true
value.subject.name.strategy=io.confluent.kafka.serializers.subject.RecordNameStrategy
# Consumer
key.deserializer=io.confluent.kafka.serializers.KafkaAvroDeserializer
value.deserializer=io.confluent.kafka.serializers.KafkaAvroDeserializer
schema.registry.url=http://localhost:8081
specific.avro.reader=true
In Spring Boot, the same keys live under properties:
spring:
kafka:
bootstrap-servers: localhost:9092
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
properties:
schema.registry.url: http://localhost:8081
auto.register.schemas: false
use.latest.version: true
consumer:
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
properties:
schema.registry.url: http://localhost:8081
specific.avro.reader: true
Disable
auto.register.schemasin production. Auto-registration lets any producer mutate the contract at runtime — instead, register schemas through a reviewed CI pipeline and have producers useuse.latest.version=true.
Best Practices
- Turn off auto-registration in production and register schemas via a gated CI step so contract changes are reviewed, not accidental.
- Set a compatibility level per subject (commonly
BACKWARD) and run/compatibilitychecks in CI before deploying producers. - Cache aggressively — the default serializers cache schema↔ID mappings; size
max.schemas.per.subjectand JVM memory accordingly for high-cardinality topics. - Choose a naming strategy deliberately; default
TopicNameStrategyis simplest, but useRecordNameStrategywhen one topic carries multiple event types. - Secure the registry with TLS and authentication, and set the registry to read-only (
mode=READONLY) for clusters that should only consume. - Never reuse a subject for an unrelated schema — deleting and re-registering breaks version history and ID continuity for existing consumers.