Quotas & Throttling
A single misbehaving client can starve an entire Kafka cluster. A batch job that produces at full line speed, or a runaway consumer with thousands of threads, will saturate network bandwidth and broker CPU, degrading latency for every other tenant. Quotas are Kafka’s built-in mechanism for fairness: they cap how much network bandwidth and request-handling capacity any one client may consume, and the brokers enforce the cap by transparently slowing offenders down. This page explains the two quota types, how throttling actually works, and how to set and inspect quotas with kafka-configs.sh.
The two kinds of quotas
Kafka enforces two independent quota dimensions, both applied per client rather than cluster-wide:
- Network bandwidth quotas limit the byte rate a client may produce or fetch, measured in bytes per second. There are two separate knobs:
producer_byte_ratefor writes andconsumer_byte_ratefor reads. - Request-rate quotas limit the percentage of broker request-handler and network-thread CPU time a client may use. The value,
request_percentage, is expressed as a percentage of one thread’s time, so on a broker with 8 I/O threads the total available budget is 800%.
Bandwidth quotas protect the network; request-rate quotas protect the CPU from clients that send many tiny or expensive requests (for example, aggressive metadata or list-offset calls) that consume little bandwidth but heavy processing.
| Quota type | Config key | Unit | Protects against |
|---|---|---|---|
| Produce bandwidth | producer_byte_rate | bytes/sec | A client flooding writes |
| Fetch bandwidth | consumer_byte_rate | bytes/sec | A client draining reads |
| Request rate | request_percentage | % of one thread | CPU-heavy / chatty clients |
How quotas are matched to clients
Each broker tracks usage independently and matches a connection to a quota using the client’s authenticated user (principal) and its client-id. Quotas can target several entity scopes, and the broker picks the most specific match:
(user, client-id)— most specific.user— all client-ids for that user.client-id— that client-id across all users.- A
--entity-defaultfor either type — the catch-all fallback.
Because the quota is per-broker, a producer_byte_rate of 10 MB/s means 10 MB/s to each broker, not 10 MB/s across the cluster. A producer writing to partitions spread over five brokers effectively gets up to 50 MB/s in aggregate.
How throttling works
Quotas are never enforced by dropping data or returning errors. Instead the broker delays the response. When a client exceeds its byte-rate or request-rate window, the broker computes how long the client must pause to bring its average rate back under the limit, then holds the produce or fetch response for that duration before sending it. The client’s own request pipeline naturally backs off because it is waiting on the delayed response.
Modern clients (KIP-219 onward) are told exactly how long they were throttled via the throttle-time-ms field in the response, surfaced through the produce-throttle-time-avg and fetch-throttle-time-avg JMX metrics on the client. If you see those metrics climbing, the client is hitting its quota.
Throttling is silent by design — there is no exception in application logs. A producer that suddenly “got slow” with no errors is very often being quota-throttled. Always check the throttle-time client metrics before chasing phantom network issues.
Setting quotas with kafka-configs.sh
Quotas are dynamic config stored in cluster metadata, so they take effect immediately with no restart. Use --entity-type users and/or --entity-type clients with --alter --add-config.
# Limit user "etl-job" to 25 MB/s produce, 50 MB/s consume
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter \
--add-config 'producer_byte_rate=26214400,consumer_byte_rate=52428800' \
--entity-type users --entity-name etl-job
# Cap a specific (user, client-id) pair on request CPU at 200% (two threads)
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter \
--add-config 'request_percentage=200' \
--entity-type users --entity-name etl-job \
--entity-type clients --entity-name reporting-client
# A safety-net default for every client-id that has no explicit quota
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter \
--add-config 'producer_byte_rate=10485760,consumer_byte_rate=10485760' \
--entity-type clients --entity-default
To inspect what is configured, use --describe:
bin/kafka-configs.sh --bootstrap-server localhost:9092 --describe \
--entity-type users --entity-name etl-job
Output:
Quota configs for user-principal 'etl-job' are:
consumer_byte_rate=52428800, producer_byte_rate=26214400
Remove a quota by listing its keys in --delete-config, which reverts the client to the next matching scope or to “unlimited”:
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter \
--delete-config 'producer_byte_rate,consumer_byte_rate' \
--entity-type users --entity-name etl-job
Tuning the client side
Quotas live entirely on the broker, but a well-behaved client should set a stable client.id so quotas can target it, and should size its tuning to live within the budget. A producer near its byte-rate limit benefits from compression and reasonable batching so each throttled response carries more useful data.
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.CLIENT_ID_CONFIG, "reporting-client"); // matched by quotas
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
props.put(ProducerConfig.LINGER_MS_CONFIG, 20);
try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
producer.send(new ProducerRecord<>("events", "key", "payload"));
}
In Spring for Apache Kafka the same client.id is set declaratively so the broker can attribute usage to your service.
spring:
kafka:
bootstrap-servers: localhost:9092
producer:
client-id: reporting-client
compression-type: lz4
properties:
linger.ms: 20
Best Practices
- Always set a conservative
--entity-defaultforclientsso a brand-new or misconfigured client cannot run unbounded before you notice it. - Quota by user (principal), not just client-id, in multi-tenant clusters — client-ids are easily spoofed, but authenticated principals are not.
- Remember quotas are per broker: divide your intended cluster-wide ceiling by the number of brokers a client touches when picking a value.
- Add
request_percentagequotas alongside bandwidth quotas — a client can pin CPU with tiny chatty requests while staying well under any byte-rate limit. - Monitor the client-side
produce-throttle-time-avgandfetch-throttle-time-avgmetrics so you can distinguish quota throttling from genuine slowdowns. - Set a stable, meaningful
client.idon every application so quotas and metrics map cleanly back to a team or service.