Pounce provides production-grade graceful shutdown with automatic connection draining, making it safe for Kubernetes, AWS, and other orchestration platforms that rely on clean SIGTERM handling.
Overview
When pounce receives a shutdown signal (SIGTERM or SIGINT), it:
- Stops accepting new connections immediately
- Finishes processing active connections (up to
shutdown_timeout) - Force-terminates workers that don't drain in time
- Exits cleanly with appropriate status codes
This ensures zero dropped requests during rolling deployments, scaling operations, and graceful shutdowns.
Quick Start
Basic Configuration
from pounce import ServerConfig
config = ServerConfig(
host="0.0.0.0",
port=8000,
shutdown_timeout=30.0, # Wait up to 30s for connections to drain
)
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: pounce-app
spec:
replicas: 3
template:
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8000
# Kubernetes graceful shutdown configuration
lifecycle:
preStop:
exec:
# Optional: Send custom shutdown signal or delay
command: ["sh", "-c", "sleep 5"]
# Give pounce time to drain connections
terminationGracePeriodSeconds: 40
Key settings:
terminationGracePeriodSeconds: Should be greater thanshutdown_timeout+ safety marginpreStophook (optional): Add delay for load balancer de-registration
How It Works
Shutdown Sequence
1. Signal Reception
When the supervisor receives SIGTERM:
# Supervisor receives signal
logger.info("Received SIGTERM — initiating shutdown")
shutdown_event.set()
2. Connection Draining
Workers immediately:
- Reject new connections with 503 status
- Continue processing existing requests
- Log drain progress
Worker 1 draining 3 active connection(s)...
Worker 2 shutting down (no active connections)
3. Timeout Enforcement
Aftershutdown_timeout, workers that haven't exited are force-terminated:
Worker 1 did not stop within shutdown_timeout (30.0s) — force terminating
Configuration
shutdown_timeout
Maximum time to wait for connections to drain before force-terminating workers.
config = ServerConfig(
shutdown_timeout=30.0, # seconds
)
Recommendations:
- Development: 5-10 seconds
- Production: 30-60 seconds
- Long-running requests: Match your longest expected request duration + buffer
Validation
The timeout must be positive:
# ✅ Valid
ServerConfig(shutdown_timeout=30.0)
# ❌ Invalid
ServerConfig(shutdown_timeout=0.0) # ValueError
ServerConfig(shutdown_timeout=-5.0) # ValueError
Kubernetes Best Practices
1. terminationGracePeriodSeconds
Set this higher than yourshutdown_timeout:
spec:
terminationGracePeriodSeconds: 40 # shutdown_timeout (30) + 10s buffer
If Kubernetes's grace period expires before pounce finishes draining, Kubernetes sends SIGKILL and forcibly terminates the pod, potentially dropping connections.
2. preStop Hook (Optional)
Add a delay to allow load balancers to de-register the pod before shutdown starts:
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 5"]
This prevents new connections from being routed to the pod after shutdown begins.
3. Readiness Probe
Use a readiness probe to stop receiving traffic before shutdown:
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
Pounce's built-in health check:
config = ServerConfig(
health_check_path="/health", # Automatic 200 OK endpoint
)
4. Complete Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: pounce-app
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
app: pounce-app
spec:
containers:
- name: app
image: mycompany/myapp:v1.2.3
ports:
- containerPort: 8000
name: http
env:
- name: SHUTDOWN_TIMEOUT
value: "30"
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 3
failureThreshold: 2
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 15
periodSeconds: 10
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 5"]
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
terminationGracePeriodSeconds: 40
---
apiVersion: v1
kind: Service
metadata:
name: pounce-app
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8000
selector:
app: pounce-app
Docker Best Practices
1. Handle Signals Properly
Useexecform of CMD to ensure signals reach pounce:
# ✅ Good - signals reach pounce directly
CMD ["pounce", "myapp:app"]
# ❌ Bad - shell doesn't forward signals
CMD pounce myapp:app
2. Run as Non-Root
FROM python:3.14-slim
RUN useradd -m -u 1000 app
USER app
# Install deps
WORKDIR /app
COPY --chown=app:app . .
RUN pip install --user pounce
# Run server
CMD ["pounce", "myapp:app", "--host", "0.0.0.0"]
3. Health Checks
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
AWS ECS / Fargate
Task Definition
{
"family": "pounce-app",
"containerDefinitions": [
{
"name": "app",
"image": "mycompany/myapp:latest",
"portMappings": [
{
"containerPort": 8000,
"protocol": "tcp"
}
],
"environment": [
{
"name": "SHUTDOWN_TIMEOUT",
"value": "30"
}
],
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
},
"stopTimeout": 40
}
]
}
Key setting: stopTimeout should be greater than shutdown_timeout.
Systemd Service
[Unit]
Description=Pounce ASGI Server
After=network.target
[Service]
Type=notify
User=www-data
Group=www-data
WorkingDirectory=/var/www/myapp
Environment="PATH=/var/www/myapp/venv/bin"
ExecStart=/var/www/myapp/venv/bin/pounce myapp:app --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=5s
# Graceful shutdown
KillMode=mixed
KillSignal=SIGTERM
TimeoutStopSec=40s
[Install]
WantedBy=multi-user.target
Key settings:
KillSignal=SIGTERM: Proper shutdown signalTimeoutStopSec: Greater thanshutdown_timeoutKillMode=mixed: SIGTERM to main process, SIGKILL to children after timeout
Monitoring Shutdown
Logs
Pounce logs the complete shutdown sequence:
[2026-02-12 10:15:30] Received SIGTERM — initiating shutdown
[2026-02-12 10:15:30] Shutting down 4 worker(s)...
[2026-02-12 10:15:30] Worker 1 draining 2 active connection(s)...
[2026-02-12 10:15:30] Worker 2 shutting down (no active connections)
[2026-02-12 10:15:32] Worker 2 stopped
[2026-02-12 10:15:33] Worker 1 stopped
[2026-02-12 10:15:33] All workers stopped
Metrics (with OpenTelemetry)
Track drain metrics in your observability platform:
from pounce import ServerConfig
config = ServerConfig(
otel_endpoint="http://localhost:4318",
otel_service_name="myapp",
shutdown_timeout=30.0,
)
See OpenTelemetry Integration for details.
Troubleshooting
Workers Not Draining in Time
Symptom: Logs show force termination:
Worker 1 did not stop within shutdown_timeout (30.0s) — force terminating
Solutions:
-
Increase timeout:
config = ServerConfig(shutdown_timeout=60.0) -
Reduce request duration: Optimize slow endpoints
-
Add request timeout:
config = ServerConfig( request_timeout=25.0, # Cancel slow requests shutdown_timeout=30.0, )
Kubernetes SIGKILL Before Drain Complete
Symptom: Pods terminated before connections finish
Solution: IncreaseterminationGracePeriodSeconds:
spec:
terminationGracePeriodSeconds: 60 # Must be > shutdown_timeout
Load Balancer Still Routing to Draining Pod
Symptom: New requests to shutting-down pods
Solution: Add preStop delay:
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
This allows time for load balancer health checks to fail and de-register the pod.
Dropped Connections During Rolling Update
Symptom: Intermittent 503 errors during deployments
Checklist:
- ✅
terminationGracePeriodSeconds>shutdown_timeout - ✅ Readiness probe configured
- ✅ preStop delay added (5-10s)
- ✅ Rolling update strategy configured:
strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 # Keep all pods running during update maxSurge: 1
Testing Graceful Shutdown
Local Testing
# Start server
pounce myapp:app &
PID=$!
# Make a long-running request in background
curl http://localhost:8000/slow-endpoint &
# Send SIGTERM
kill -TERM $PID
# Watch logs - should see:
# "Worker 1 draining 1 active connection(s)..."
# "Worker 1 stopped"
Load Testing
Use wrk or hey to test under load:
# Terminal 1: Start server
pounce myapp:app
# Terminal 2: Generate load
hey -z 60s http://localhost:8000/
# Terminal 3: Send SIGTERM after 10s
sleep 10 && pkill -TERM pounce
# Check: No connection errors in hey output
Comparison with Other Servers
| Server | SIGTERM Handling | Connection Draining | Force Timeout | Kubernetes-Ready |
|---|---|---|---|---|
| pounce | ✅ Built-in | ✅ Automatic | ✅ Configurable | ✅ Yes |
| Uvicorn | ✅ Built-in | ⚠️ Basic | ❌ No timeout | ⚠️ Partial |
| Gunicorn | ✅ Built-in | ⚠️ Basic | ✅ Fixed (30s) | ⚠️ Partial |
| Hypercorn | ✅ Built-in | ✅ Good | ✅ Configurable | ✅ Yes |
Pounce provides production-grade shutdown with clear logging, configurable timeouts, and Kubernetes best practices out of the box.
See Also
- Health Checks — Built-in
/healthendpoint - OpenTelemetry — Monitor shutdown metrics
- Production Deployment — Complete production guide
- Graceful Reload — Zero-downtime code updates