Protect Your APIs with Token Bucket Rate Limiting
APIs power everything from mobile apps to enterprise platforms, quietly handling millions of requests per day. Without safeguards, a single misconfigured client or a burst of automated traffic can overwhelm your service, degrading performance for everyone.
Rate limiting prevents this. It controls how many requests a client can make within a given timeframe, protecting your infrastructure from both intentional abuse and accidental overload. Among the several algorithms used for rate limiting, the Token Bucket stands out for its balance of simplicity and flexibility.
Why Token Bucket?
Unlike fixed window counters that reset abruptly, the Token Bucket allows short bursts of traffic while still enforcing a sustainable long-term rate. This makes it a practical choice for APIs where clients occasionally need to send a quick flurry of requests without being penalized.
Key Features of Token Bucket
- Capacity: Maximum burst size (e.g., 10 tokens = 10 requests at once)
- Refill Rate: Sustained throughput (e.g., 2 tokens/sec = 2 requests/sec long-term)
- Refill Interval: Granularity of refill (e.g., 1.0 sec = tokens added every second)
This two-parameter design gives you precise control over API behavior while maintaining performance for legitimate users.
Implementing Token Bucket in FastAPI
In this guide, you’ll build a Token Bucket rate limiter from scratch as a Python class, wire it into FastAPI as middleware with per-user tracking, add standard rate limit headers to your responses, and test everything with a simple script.
Step 1: Project Setup
Ensure you have Python 3.9+ installed. Create a project directory and install dependencies:
mkdir fastapi-ratelimit && cd fastapi-ratelimit
python -m venv venv
pip install fastapi uvicornStep 2: Token Bucket Class
Implement the algorithm with thread-safe operations in ratelimiter.py:
import time
import threading
class TokenBucket:
def __init__(self, max_tokens: int, refill_rate: int, interval: float):
self.max_tokens = max_tokens
self.refill_rate = refill_rate
self.interval = interval
self.tokens = max_tokens
self.refilled_at = time.time()
self.lock = threading.Lock()
def _refill(self):
now = time.time()
elapsed = now - self.refilled_at
num_refills = int(elapsed // self.interval)
self.tokens = min(
self.max_tokens,
self.tokens + num_refills * self.refill_rate
)
self.refilled_at += num_refills * self.interval
def allow_request(self, tokens: int = 1) -> bool:
with self.lock:
self._refill()
if self.tokens >= tokens:
self.tokens -= tokens
return True
return FalseTesting Your Rate Limiter
Wire the middleware into FastAPI and test with a simple script. The implementation will:
- Reject requests when the bucket is empty (429 Too Many Requests)
- Allow bursts up to capacity
- Refill tokens at the defined rate
Where to Use This Pattern
Token Bucket rate limiting is ideal for:
- Public APIs with unpredictable traffic patterns
- Microservices architectures needing per-user limits
- Rate-limited endpoints requiring burst tolerance
Conclusion
By implementing Token Bucket rate limiting in FastAPI, you gain a powerful tool to protect your services from abuse while maintaining flexibility for legitimate users. This pattern is used by major platforms like AWS and Stripe, proving its effectiveness in production environments.
Ready to secure your API? Start with the code examples in this guide and adapt them to your specific requirements. For more advanced use cases, consider adding Redis-based storage for distributed systems or integrating with monitoring tools to track API usage patterns.







