REST API Throttling and Rate Limiting
Protect your API from abuse and accidental overload using token buckets, leaky buckets, and standard rate-limit headers that clients can actually respect.
What you'll learn
- ✓The difference between throttling and rate limiting
- ✓Token bucket vs leaky bucket algorithms
- ✓Standard rate-limit headers
- ✓Per-user vs per-IP vs per-key strategies
- ✓Communicating limits to clients
Prerequisites
- •Familiar with HTTP
What and Why
Every public API gets hammered eventually. A buggy client retries in a tight loop, a scraper goes wide, or a free-tier user discovers your most expensive endpoint. Rate limiting is the shield: a policy that caps how many requests a client can send in a window.
Throttling is related but slightly different. Throttling smooths traffic by delaying or queueing requests; rate limiting outright rejects requests over the cap. Most production systems combine both.
The goal is fairness and stability. One noisy client should not degrade everyone else.
Mental Model
The two classic algorithms are token bucket and leaky bucket.
A token bucket holds N tokens. Each request consumes one. Tokens refill at a constant rate. If the bucket is empty, the request is rejected. This allows short bursts (the bucket fills back up between bursts) while bounding average rate.
A leaky bucket queues requests and processes them at a fixed rate. Bursts are smoothed rather than rejected.
Refill: +1 token / sec, capacity = 10
time bucket request result
0s [##########] req -> [#########] OK
0s [#########] req -> [########] OK
... 10 quick requests -> bucket empty
0s [ ] req -> REJECTED 429
1s [# ] req -> [ ] OK
2s [# ] req -> [ ] OK Token bucket is the default for most REST APIs because it is simple and forgives small bursts.
Hands-on Example
Imagine /v1/messages allows 60 requests per minute per API key. A token bucket with capacity 60 and refill of 1/sec works well.
When the client is within the limit, return the resource plus standard headers:
HTTP/1.1 200 OK
RateLimit-Limit: 60
RateLimit-Remaining: 42
RateLimit-Reset: 18
The IETF draft RateLimit headers are gaining adoption. The older X-RateLimit-* variants are still common; pick one and document it.
When the client exceeds the limit, return 429 Too Many Requests with Retry-After:
HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 60
RateLimit-Remaining: 0
RateLimit-Reset: 12
Retry-After: 12
Content-Type: application/json
{
"error": {
"code": "rate_limited",
"message": "Too many requests. Retry in 12 seconds."
}
}
A minimal Redis-backed token bucket in pseudocode:
def allow(key, capacity=60, refill_per_sec=1):
now = time.time()
state = redis.hgetall(key) or {"tokens": capacity, "ts": now}
elapsed = now - float(state["ts"])
tokens = min(capacity, float(state["tokens"]) + elapsed * refill_per_sec)
if tokens < 1:
return False, tokens
tokens -= 1
redis.hset(key, mapping={"tokens": tokens, "ts": now})
return True, tokens
Use a Lua script in production to make the read-modify-write atomic.
Common Pitfalls
Limiting by IP only. NAT, corporate proxies, and mobile carriers share IPs. Whole organizations get blocked. Prefer API key or user ID with IP as a secondary signal.
Ignoring write vs read cost. A search query may cost ten times a simple GET. Charge tokens proportional to work, not per request.
Forgetting to limit unauthenticated traffic. Login and signup endpoints are favorite targets. Add stricter limits before authentication.
Silent throttling. Slowing requests without telling the client looks like a server bug. Always communicate via headers or status codes.
Global limits only. A single shared bucket across users means one user can starve another. Almost always layer per-key and global limits together.
Wrong status code. 429 is the correct response. 503 means the whole service is unavailable; reserve it for that.
Practical Tips
- Implement multiple tiers: per-key, per-IP, and global. Reject on the first one that trips.
- Use sliding windows or token buckets, not fixed windows. Fixed windows allow double the limit at the boundary.
- Expose limits in your docs and on the response. Clients that know their budget behave better.
- Add jitter to
Retry-Aftersuggestions so clients do not all retry at the same instant. - Test your limiter under load. A buggy limiter that calls Redis on every request becomes the bottleneck.
- Whitelist health checks and internal services so monitoring does not eat the budget.
- Track 429 rates as a SLO. A spike often signals a real client bug worth investigating.
Wrap-up
Rate limiting is one of those features users only notice when it is missing or broken. Choose a token bucket for most APIs, return 429 with Retry-After and RateLimit-* headers, and key your limits by API key plus IP.
Done right, rate limiting is invisible to good citizens and decisive against bad ones. That is the whole job.
Related articles
- REST APIs REST API Error Handling Conventions
Design clear, consistent error responses for REST APIs using HTTP status codes, problem details, and error envelopes that clients can actually handle.
- REST APIs REST API HATEOAS Explained
Understand Hypermedia as the Engine of Application State, why most REST APIs skip it, and when adding hypermedia links actually pays off.
- REST APIs REST API Pagination Patterns
Compare offset, cursor, and keyset pagination for REST APIs. Pick the right pattern for your data, scale, and client experience.
- REST APIs REST API Versioning Strategies
Compare URL, header, and content-type versioning for REST APIs. Learn when to bump versions and how to retire old ones without breaking clients.