Rate Limiting
Rate limiting caps how many requests a client may make in a window of time. It protects an API from abuse, accidental retry storms, and noisy neighbours starving everyone else, and it enforces fair-use tiers for paying customers. The most common and well-behaved algorithm is the token bucket, and the standard Java library for it is Bucket4j, which plugs neatly into a Spring Boot filter or interceptor.
The token-bucket algorithm
Imagine a bucket that holds up to N tokens and refills at a steady rate. Each request removes one token; if the bucket is empty, the request is rejected. Because the bucket has capacity, it tolerates short bursts (spending saved-up tokens) while still bounding the long-run average rate. This is friendlier than a fixed window, which lets a client fire its whole quota at the boundary and double the intended rate.
capacity = 10 tokens, refill = 10 tokens / minute
t=0s bucket=10 burst of 10 requests -> all allowed, bucket=0
t=1s request #11 -> bucket empty -> 429 Too Many Requests
t=6s ~1 token refilled -> 1 request allowed
Adding Bucket4j
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j-core</artifactId>
<version>8.10.1</version>
</dependency>
Define a reusable bucket configuration:
import io.github.bucket4j.Bandwidth;
import io.github.bucket4j.Bucket;
import java.time.Duration;
public class Buckets {
public static Bucket newBucket() {
Bandwidth limit = Bandwidth.builder()
.capacity(10)
.refillGreedy(10, Duration.ofMinutes(1))
.build();
return Bucket.builder().addLimit(limit).build();
}
}
A per-client filter
A servlet Filter runs before the controller, so it’s the natural place to enforce limits. Identify the client (here by API key; in practice an authenticated user id is better than a spoofable IP) and keep one bucket per client.
import io.github.bucket4j.Bucket;
import io.github.bucket4j.ConsumptionProbe;
import jakarta.servlet.*;
import jakarta.servlet.http.*;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
@Component
public class RateLimitFilter implements Filter {
private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
@Override
public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain)
throws IOException, ServletException {
HttpServletRequest request = (HttpServletRequest) req;
HttpServletResponse response = (HttpServletResponse) res;
String clientId = request.getHeader("X-API-Key");
if (clientId == null) clientId = request.getRemoteAddr();
Bucket bucket = buckets.computeIfAbsent(clientId, k -> Buckets.newBucket());
ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
if (probe.isConsumed()) {
response.addHeader("X-Rate-Limit-Remaining", String.valueOf(probe.getRemainingTokens()));
chain.doFilter(req, res);
} else {
long waitSeconds = probe.getNanosToWaitForRefill() / 1_000_000_000;
response.setStatus(429); // Too Many Requests
response.addHeader("Retry-After", String.valueOf(waitSeconds));
response.setContentType("application/json");
response.getWriter().write(
"{\"error\":\"rate_limit_exceeded\",\"retryAfter\":" + waitSeconds + "}");
}
}
}
Output (the 11th request in a minute):
HTTP/1.1 429 Too Many Requests
Retry-After: 54
Content-Type: application/json
{"error":"rate_limit_exceeded","retryAfter":54}
Returning 429 with a Retry-After header is the correct HTTP contract — well-behaved clients read it and back off. You could equally implement this as an interceptor if you need access to the resolved handler method.
Warning: The in-memory
ConcurrentHashMapof buckets grows forever and lives in one JVM only. It’s fine for a single instance and demos, but it both leaks memory (no eviction of idle clients) and gives each instance its own independent limit.
Distributed buckets with Redis
When you run multiple instances behind a load balancer, every instance must share the same bucket or a client gets N × instances of throughput. Bucket4j stores bucket state in a shared backend — commonly Redis via the Lettuce integration.
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j_jdk17-redis-common</artifactId>
<version>8.10.1</version>
</dependency>
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j_jdk17-lettuce</artifactId>
<version>8.10.1</version>
</dependency>
spring:
data:
redis:
host: localhost
port: 6379
You build a proxyManager over the Redis client and resolve a distributed bucket per client key. The atomic token accounting then happens in Redis, so all instances see one shared limit. This reuses the same Redis you may already run for caching.
| Backend | Shared across instances | Survives restart | Best for |
|---|---|---|---|
| In-memory map | No | No | Single instance, dev |
| Redis (Lettuce) | Yes | Yes (with persistence) | Multi-instance production |
Alternatives at the gateway layer
Rate limiting is often pushed out of the application to the edge, so abusive traffic never reaches your service at all:
- Spring Cloud Gateway ships a built-in
RequestRateLimiterfilter (Redis-backed token bucket) you configure per route. - API gateways / reverse proxies (NGINX, Kong, AWS API Gateway, Envoy) enforce limits centrally for many services.
Tip: Edge rate limiting protects the whole system and frees app instances from the work, while in-app limiting gives you per-endpoint or per-business-rule granularity. Many teams use both — a coarse edge limit plus fine-grained in-app limits.
Best Practices
- Use the token-bucket algorithm to allow bursts while bounding the average rate.
- Key buckets on an authenticated identity, not a spoofable IP, where possible.
- Always return
429with aRetry-Afterheader so clients can back off correctly. - Use a Redis-backed store for any multi-instance deployment; the in-memory map is per-JVM.
- Prefer the gateway for coarse, system-wide protection; use in-app limits for fine-grained rules.