Token Bucket Rate Limiting with FastAPI: A Practical Guide

Protect Your APIs with Token Bucket Rate Limiting

APIs power everything from mobile apps to enterprise platforms, quietly handling millions of requests per day. Without safeguards, a single misconfigured client or a burst of automated traffic can overwhelm your service, degrading performance for everyone.

Rate limiting prevents this. It controls how many requests a client can make within a given timeframe, protecting your infrastructure from both intentional abuse and accidental overload. Among the several algorithms used for rate limiting, the Token Bucket stands out for its balance of simplicity and flexibility.

Why Token Bucket?

Unlike fixed window counters that reset abruptly, the Token Bucket allows short bursts of traffic while still enforcing a sustainable long-term rate. This makes it a practical choice for APIs where clients occasionally need to send a quick flurry of requests without being penalized.

Key Features of Token Bucket

Capacity: Maximum burst size (e.g., 10 tokens = 10 requests at once)
Refill Rate: Sustained throughput (e.g., 2 tokens/sec = 2 requests/sec long-term)
Refill Interval: Granularity of refill (e.g., 1.0 sec = tokens added every second)

This two-parameter design gives you precise control over API behavior while maintaining performance for legitimate users.

Implementing Token Bucket in FastAPI

In this guide, you’ll build a Token Bucket rate limiter from scratch as a Python class, wire it into FastAPI as middleware with per-user tracking, add standard rate limit headers to your responses, and test everything with a simple script.

Step 1: Project Setup

Ensure you have Python 3.9+ installed. Create a project directory and install dependencies:

mkdir fastapi-ratelimit && cd fastapi-ratelimit
python -m venv venv
pip install fastapi uvicorn

Step 2: Token Bucket Class

Implement the algorithm with thread-safe operations in ratelimiter.py:

import time
import threading

class TokenBucket:

def __init__(self, max_tokens: int, refill_rate: int, interval: float):

self.max_tokens = max_tokens

self.refill_rate = refill_rate

self.interval = interval

self.tokens = max_tokens

self.refilled_at = time.time()

self.lock = threading.Lock()

def _refill(self):

now = time.time()

elapsed = now - self.refilled_at

num_refills = int(elapsed // self.interval)

self.tokens = min(

self.max_tokens,

self.tokens + num_refills * self.refill_rate

)

self.refilled_at += num_refills * self.interval

def allow_request(self, tokens: int = 1) -> bool:

with self.lock:

self._refill()

if self.tokens >= tokens:

self.tokens -= tokens

return True

return False

Testing Your Rate Limiter

Wire the middleware into FastAPI and test with a simple script. The implementation will:

Reject requests when the bucket is empty (429 Too Many Requests)
Allow bursts up to capacity
Refill tokens at the defined rate

Where to Use This Pattern

Token Bucket rate limiting is ideal for:

Public APIs with unpredictable traffic patterns
Microservices architectures needing per-user limits
Rate-limited endpoints requiring burst tolerance

Conclusion

By implementing Token Bucket rate limiting in FastAPI, you gain a powerful tool to protect your services from abuse while maintaining flexibility for legitimate users. This pattern is used by major platforms like AWS and Stripe, proving its effectiveness in production environments.

Ready to secure your API? Start with the code examples in this guide and adapt them to your specific requirements. For more advanced use cases, consider adding Redis-based storage for distributed systems or integrating with monitoring tools to track API usage patterns.

Token Bucket Rate Limiting with FastAPI: A Practical Guide

Protect Your APIs with Token Bucket Rate Limiting