Avro Serialization

Apache Avro is the long-standing default serialization format for the Confluent Kafka ecosystem, and for good reason: it produces compact binary records, enforces a strict schema at write time, and — paired with a Schema Registry — lets producers and consumers evolve independently over the years a topic stays alive. The schema is defined once in a .avsc file rather than embedded in every record, so payloads stay small while the contract stays explicit. This page shows how to define an Avro schema, generate typed classes, and wire up KafkaAvroSerializer/KafkaAvroDeserializer against a registry.

Defining an Avro schema

An Avro schema is a JSON document, conventionally stored in a .avsc file under src/main/avro/. It declares a fully-qualified record type with strongly typed, named fields. Optional fields are modelled as a union with null, and defaults make later evolution backward-compatible.

{
  "type": "record",
  "name": "OrderPlaced",
  "namespace": "com.devcraftly.events",
  "fields": [
    { "name": "orderId", "type": "string" },
    { "name": "customerId", "type": "string" },
    { "name": "amountCents", "type": "long" },
    { "name": "currency", "type": "string", "default": "USD" },
    { "name": "placedAt", "type": { "type": "long", "logicalType": "timestamp-millis" } }
  ]
}

Generating Java classes

Avro ships a code generator that turns each .avsc file into a typed SpecificRecord class. With Gradle, the gradle-avro-plugin runs it automatically as part of compilation; with Maven, the avro-maven-plugin binds to the generate-sources phase.

<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>1.12.0</version>
  <executions>
    <execution>
      <phase>generate-sources</phase>
      <goals><goal>schema</goal></goals>
      <configuration>
        <sourceDirectory>${project.basedir}/src/main/avro</sourceDirectory>
      </configuration>
    </execution>
  </executions>
</plugin>

This produces com.devcraftly.events.OrderPlaced — an immutable-ish builder-based class with typed getters — which you use directly in your producer and consumer code. Add the Confluent serializer dependency (io.confluent:kafka-avro-serializer) and the confluent Maven repository to pull in KafkaAvroSerializer.

The wire format

Avro records on Kafka are not self-describing. The serializer registers the schema with the registry once, gets back an integer ID, and prefixes the binary payload with a 5-byte header. The consumer reads the ID, fetches (and caches) the matching schema, and decodes the rest.

[ magic byte: 0x00 ][ 4-byte schema ID (big-endian) ][ Avro binary payload ]

The magic byte (0x00) marks the Confluent wire format; the schema ID lets a consumer deserialize even when the writer used a different schema version than the one it was compiled against.

Producer configuration

A Spring Boot producer sets KafkaAvroSerializer as the value serializer and points it at the registry. In production, auto.register.schemas: false forces schemas to be registered deliberately (via CI) so an incompatible change is rejected before it reaches the broker.

spring:
  kafka:
    bootstrap-servers: localhost:9092
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
      properties:
        schema.registry.url: http://localhost:8081
        auto.register.schemas: false

The producer code uses the generated type directly:

@Service
public class OrderProducer {

    private final KafkaTemplate<String, OrderPlaced> kafkaTemplate;

    public OrderProducer(KafkaTemplate<String, OrderPlaced> kafkaTemplate) {
        this.kafkaTemplate = kafkaTemplate;
    }

    public void publish(String orderId, String customerId, long amountCents) {
        OrderPlaced event = OrderPlaced.newBuilder()
                .setOrderId(orderId)
                .setCustomerId(customerId)
                .setAmountCents(amountCents)
                .setCurrency("USD")
                .setPlacedAt(Instant.now().toEpochMilli())
                .build();
        kafkaTemplate.send("orders", orderId, event);
    }
}

Consumer configuration

The consumer uses KafkaAvroDeserializer. The critical setting is specific.avro.reader: true — without it the deserializer returns a generic GenericRecord (a dynamic map of fields) instead of your generated OrderPlaced class.

spring:
  kafka:
    consumer:
      group-id: order-service
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
      properties:
        schema.registry.url: http://localhost:8081
        specific.avro.reader: true

@Component
public class OrderConsumer {

    @KafkaListener(topics = "orders", groupId = "order-service")
    public void onOrder(OrderPlaced event) {
        System.out.printf("Order %s for customer %s: %d %s%n",
                event.getOrderId(), event.getCustomerId(),
                event.getAmountCents(), event.getCurrency());
    }
}

Output:

Order ord-9341 for customer cust-77: 4999 USD

Inspecting schemas from the CLI

You can confirm a schema is registered under the expected subject. The default subject for a topic’s value is <topic>-value under the TopicNameStrategy.

curl -s http://localhost:8081/subjects/orders-value/versions/latest | jq .