Built-in per-IP rate limiting with token bucket algorithm for production abuse protection.
Overview
Pounce includes production-grade rate limiting to protect your server from:
- Abusive clients - Block excessive requests from single IPs
- DDoS attacks - Shed load during traffic spikes
- API abuse - Enforce fair usage policies
- Resource exhaustion - Prevent server overload
Token Bucket Algorithm
Classic token bucket rate limiting:
- Tokens refill at a constant rate (requests per second)
- Bucket has maximum capacity (burst size)
- Each request consumes one token
- Requests are denied when bucket is empty
- Each client IP has its own bucket
This allows burst traffic while enforcing sustained rate limits.
Quick Start
Basic Configuration
Enable rate limiting with default settings (100 req/s per IP):
from pounce import ServerConfig
config = ServerConfig(
rate_limit_enabled=True,
)
Custom Limits
Configure custom rate limits and burst size:
config = ServerConfig(
rate_limit_enabled=True,
rate_limit_requests_per_second=50.0, # 50 req/s per IP
rate_limit_burst=100, # Allow bursts up to 100
)
Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
rate_limit_enabled |
bool | False |
Enable per-IP rate limiting |
rate_limit_requests_per_second |
float | 100.0 |
Sustained rate limit per IP |
rate_limit_burst |
int | 200 |
Maximum burst capacity per IP |
How It Works
Per-IP Tracking
Rate limits are enforced per client IP address:
- Each IP gets its own token bucket
- Limits are independent across IPs
- IPv4 and IPv6 are tracked separately
Token Refill
Tokens refill at a constant rate:
refill_rate = rate_limit_requests_per_second
time_between_tokens = 1.0 / refill_rate
For 100 req/s:
- New token every 10ms
- 10 tokens per 100ms
- 1000 tokens per 10s
Burst Handling
Burst capacity allows temporary spikes:
- New clients start with full bucket
- Can immediately consume up to
bursttokens - Then limited to sustained rate
Example:
- Rate: 10 req/s
- Burst: 50
Client can make:
- 50 requests instantly (burst)
- Then 10 req/s sustained (rate)
Memory Management
Automatic cleanup prevents memory leaks:
- Inactive buckets (full capacity) are cleaned up every 5 minutes
- Stale IP tracking is removed automatically
- Memory usage scales with active clients only
Response Codes
429 Too Many Requests
Rate limited requests receive:
HTTP/1.1 429 Too Many Requests
Content-Type: text/plain
Retry-After: 1
Too Many Requests
The Retry-Afterheader tells clients when to retry (in seconds).
Examples
API Server
from pounce import run, ServerConfig
config = ServerConfig(
rate_limit_enabled=True,
rate_limit_requests_per_second=100.0,
rate_limit_burst=200,
)
run("myapi:app", config=config)
High-Traffic Service
config = ServerConfig(
rate_limit_enabled=True,
rate_limit_requests_per_second=50.0,
rate_limit_burst=100,
max_connections=5000,
)
Microservice
config = ServerConfig(
rate_limit_enabled=True,
rate_limit_requests_per_second=1000.0,
rate_limit_burst=5000,
)
Best Practices
Choosing Rate Limits
Conservative (public APIs):
- Rate: 10-50 req/s per IP
- Burst: 2-5x rate
Moderate (web apps):
- Rate: 50-100 req/s per IP
- Burst: 2x rate
Lenient (internal services):
- Rate: 100-1000 req/s per IP
- Burst: 5-10x rate
Monitoring
Track rate limiting effectiveness with Prometheus metrics:
http_requests_total{status="429"} # Rate limited requests
Client Handling
Teach clients to respect rate limits:
Parse Retry-After:
import requests
response = requests.get("https://api.example.com/users")
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 1))
time.sleep(retry_after)
# Retry request
Exponential Backoff:
def make_request_with_backoff(url, max_retries=3):
for attempt in range(max_retries):
response = requests.get(url)
if response.status_code != 429:
return response
retry_after = int(response.headers.get("Retry-After", 1))
backoff = retry_after * (2 ** attempt)
time.sleep(backoff)
raise Exception("Rate limited after retries")
Advanced Usage
Proxy Considerations
When behind a proxy (nginx, HAProxy), rate limiting may see the proxy IP instead of client IP.
Solution: Usetrusted_hoststo extract real client IP:
config = ServerConfig(
rate_limit_enabled=True,
trusted_hosts=frozenset({"127.0.0.1", "10.0.0.0/8"}),
)
Security: Only enable trusted_hostswhen you control the proxy!
Per-Route Limits
For different limits per route, use custom middleware:
from pounce._rate_limiter import RateLimiter
# Strict limits for expensive endpoints
strict_limiter = RateLimiter(rate=10.0, burst=20)
# Lenient limits for cheap endpoints
lenient_limiter = RateLimiter(rate=100.0, burst=200)
async def rate_limit_middleware(scope, receive, send):
if scope["path"].startswith("/api/expensive"):
if not strict_limiter.check_rate_limit(scope["client"][0]):
# Return 429
return
elif scope["path"].startswith("/api/"):
if not lenient_limiter.check_rate_limit(scope["client"][0]):
# Return 429
return
# Process request
await app(scope, receive, send)
Distributed Rate Limiting
Pounce's built-in rate limiting is per-server. For multi-server deployments, consider:
- Redis-based rate limiting - Shared state across servers
- Sticky sessions - Route same IP to same server
- Per-server limits - Each server enforces independently
Performance Impact
Rate limiting adds minimal overhead:
- ~5-10us per request - Token bucket check
- Thread-safe - Lock-based synchronization
- Memory efficient - ~100 bytes per active IP
- Auto-cleanup - Stale buckets removed every 5 minutes
For 10,000 active IPs:
- Memory: ~1 MB
- CPU: <0.1% additional load
Troubleshooting
False Positives
If legitimate users are rate limited:
- Check burst size - May be too low for bursty traffic
- Increase rate - May be too conservative
- Check proxy config - Multiple users may share proxy IP
- Monitor patterns - Use metrics to identify issues
No Rate Limiting
If rate limiting isn't working:
- Check config - Ensure
rate_limit_enabled=True - Verify integration - Check server logs for "Rate limiting enabled"
- Test limits - Send rapid requests to trigger limit
- Check client IP - Ensure scope["client"] is present
See Also
- Request Queueing — Global load shedding
- Observability — Monitor rate limiting effectiveness
- Graceful Shutdown — Handle in-flight rate limited requests