Rate Limiting

Rate limiting caps how many requests a client may make in a window of time. It protects an API from abuse, accidental retry storms, and noisy neighbours starving everyone else, and it enforces fair-use tiers for paying customers. The most common and well-behaved algorithm is the token bucket, and the standard Java library for it is Bucket4j, which plugs neatly into a Spring Boot filter or interceptor.

The token-bucket algorithm

Imagine a bucket that holds up to N tokens and refills at a steady rate. Each request removes one token; if the bucket is empty, the request is rejected. Because the bucket has capacity, it tolerates short bursts (spending saved-up tokens) while still bounding the long-run average rate. This is friendlier than a fixed window, which lets a client fire its whole quota at the boundary and double the intended rate.

capacity = 10 tokens, refill = 10 tokens / minute
t=0s   bucket=10   burst of 10 requests -> all allowed, bucket=0
t=1s   request #11 -> bucket empty -> 429 Too Many Requests
t=6s   ~1 token refilled -> 1 request allowed

Adding Bucket4j

<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-core</artifactId>
    <version>8.10.1</version>
</dependency>

Define a reusable bucket configuration:

import io.github.bucket4j.Bandwidth;
import io.github.bucket4j.Bucket;
import java.time.Duration;

public class Buckets {
    public static Bucket newBucket() {
        Bandwidth limit = Bandwidth.builder()
                .capacity(10)
                .refillGreedy(10, Duration.ofMinutes(1))
                .build();
        return Bucket.builder().addLimit(limit).build();
    }
}

A per-client filter

A servlet Filter runs before the controller, so it’s the natural place to enforce limits. Identify the client (here by API key; in practice an authenticated user id is better than a spoofable IP) and keep one bucket per client.

import io.github.bucket4j.Bucket;
import io.github.bucket4j.ConsumptionProbe;
import jakarta.servlet.*;
import jakarta.servlet.http.*;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Component
public class RateLimitFilter implements Filter {

    private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();

    @Override
    public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain)
            throws IOException, ServletException {
        HttpServletRequest request = (HttpServletRequest) req;
        HttpServletResponse response = (HttpServletResponse) res;

        String clientId = request.getHeader("X-API-Key");
        if (clientId == null) clientId = request.getRemoteAddr();

        Bucket bucket = buckets.computeIfAbsent(clientId, k -> Buckets.newBucket());
        ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);

        if (probe.isConsumed()) {
            response.addHeader("X-Rate-Limit-Remaining", String.valueOf(probe.getRemainingTokens()));
            chain.doFilter(req, res);
        } else {
            long waitSeconds = probe.getNanosToWaitForRefill() / 1_000_000_000;
            response.setStatus(429);                       // Too Many Requests
            response.addHeader("Retry-After", String.valueOf(waitSeconds));
            response.setContentType("application/json");
            response.getWriter().write(
                "{\"error\":\"rate_limit_exceeded\",\"retryAfter\":" + waitSeconds + "}");
        }
    }
}

Output (the 11th request in a minute):

HTTP/1.1 429 Too Many Requests
Retry-After: 54
Content-Type: application/json

{"error":"rate_limit_exceeded","retryAfter":54}

Returning 429 with a Retry-After header is the correct HTTP contract — well-behaved clients read it and back off. You could equally implement this as an interceptor if you need access to the resolved handler method.

Warning: The in-memory ConcurrentHashMap of buckets grows forever and lives in one JVM only. It’s fine for a single instance and demos, but it both leaks memory (no eviction of idle clients) and gives each instance its own independent limit.

Distributed buckets with Redis

When you run multiple instances behind a load balancer, every instance must share the same bucket or a client gets N × instances of throughput. Bucket4j stores bucket state in a shared backend — commonly Redis via the Lettuce integration.

<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j_jdk17-redis-common</artifactId>
    <version>8.10.1</version>
</dependency>
<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j_jdk17-lettuce</artifactId>
    <version>8.10.1</version>
</dependency>

spring:
  data:
    redis:
      host: localhost
      port: 6379

You build a proxyManager over the Redis client and resolve a distributed bucket per client key. The atomic token accounting then happens in Redis, so all instances see one shared limit. This reuses the same Redis you may already run for caching.

Backend	Shared across instances	Survives restart	Best for
In-memory map	No	No	Single instance, dev
Redis (Lettuce)	Yes	Yes (with persistence)	Multi-instance production

Alternatives at the gateway layer

Rate limiting is often pushed out of the application to the edge, so abusive traffic never reaches your service at all:

Spring Cloud Gateway ships a built-in RequestRateLimiter filter (Redis-backed token bucket) you configure per route.
API gateways / reverse proxies (NGINX, Kong, AWS API Gateway, Envoy) enforce limits centrally for many services.

Tip: Edge rate limiting protects the whole system and frees app instances from the work, while in-app limiting gives you per-endpoint or per-business-rule granularity. Many teams use both — a coarse edge limit plus fine-grained in-app limits.

Best Practices

Use the token-bucket algorithm to allow bursts while bounding the average rate.
Key buckets on an authenticated identity, not a spoofable IP, where possible.
Always return 429 with a Retry-After header so clients can back off correctly.
Use a Redis-backed store for any multi-instance deployment; the in-memory map is per-JVM.
Prefer the gateway for coarse, system-wide protection; use in-app limits for fine-grained rules.