# Graceful Shutdown

URL: /docs/features/graceful-shutdown/
Section: features

--------------------------------------------------------------------------------

Graceful Shutdown and Connection Draining Pounce provides production-grade graceful shutdown with automatic connection draining, making it safe for Kubernetes, AWS, and other orchestration platforms that rely on clean SIGTERM handling. Overview When pounce receives a shutdown signal (SIGTERM or SIGINT), it: Stops accepting new connections immediately Finishes processing active connections (up to shutdown_timeout) Force-terminates workers that don't drain in time Exits cleanly with appropriate status codes This ensures zero dropped requests during rolling deployments, scaling operations, and graceful shutdowns. Quick Start Basic Configuration from pounce import ServerConfig config = ServerConfig( host=&quot;0.0.0.0&quot;, port=8000, shutdown_timeout=30.0, # Wait up to 30s for connections to drain ) Kubernetes Deployment apiVersion: apps/v1 kind: Deployment metadata: name: pounce-app spec: replicas: 3 template: spec: containers: - name: app image: myapp:latest ports: - containerPort: 8000 # Kubernetes graceful shutdown configuration lifecycle: preStop: exec: # Optional: Send custom shutdown signal or delay command: [&quot;sh&quot;, &quot;-c&quot;, &quot;sleep 5&quot;] # Give pounce time to drain connections terminationGracePeriodSeconds: 40 Key settings: terminationGracePeriodSeconds: Should be greater than shutdown_timeout + safety margin preStop hook (optional): Add delay for load balancer de-registration How It Works Shutdown Sequence sequenceDiagram participant K8s as Kubernetes participant Supervisor as Pounce Supervisor participant Worker as Worker Process participant Client as HTTP Client K8s-&gt;&gt;Supervisor: SIGTERM Supervisor-&gt;&gt;Worker: Shutdown Signal Worker-&gt;&gt;Worker: Stop accepting new connections Client-&gt;&gt;Worker: New connection attempt Worker-&gt;&gt;Client: 503 Service Unavailable Worker-&gt;&gt;Worker: Finish active connections Worker-&gt;&gt;Supervisor: Exit (clean) Supervisor-&gt;&gt;K8s: Exit 0 1. Signal Reception When the supervisor receives SIGTERM: # Supervisor receives signal logger.info(&quot;Received SIGTERM — initiating shutdown&quot;) shutdown_event.set() 2. Connection Draining Workers immediately: Reject new connections with 503 status Continue processing existing requests Log drain progress Worker 1 draining 3 active connection(s)... Worker 2 shutting down (no active connections) 3. Timeout Enforcement After shutdown_timeout, workers that haven't exited are force-terminated: Worker 1 did not stop within shutdown_timeout (30.0s) — force terminating Configuration shutdown_timeout Maximum time to wait for connections to drain before force-terminating workers. config = ServerConfig( shutdown_timeout=30.0, # seconds ) Recommendations: Development: 5-10 seconds Production: 30-60 seconds Long-running requests: Match your longest expected request duration + buffer Validation The timeout must be positive: # ✅ Valid ServerConfig(shutdown_timeout=30.0) # ❌ Invalid ServerConfig(shutdown_timeout=0.0) # ValueError ServerConfig(shutdown_timeout=-5.0) # ValueError Kubernetes Best Practices 1. terminationGracePeriodSeconds Set this higher than your shutdown_timeout: spec: terminationGracePeriodSeconds: 40 # shutdown_timeout (30) + 10s buffer If Kubernetes's grace period expires before pounce finishes draining, Kubernetes sends SIGKILL and forcibly terminates the pod, potentially dropping connections. 2. preStop Hook (Optional) Add a delay to allow load balancers to de-register the pod before shutdown starts: lifecycle: preStop: exec: command: [&quot;sh&quot;, &quot;-c&quot;, &quot;sleep 5&quot;] This prevents new connections from being routed to the pod after shutdown begins. 3. Readiness Probe Use a readiness probe to stop receiving traffic before shutdown: readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 5 periodSeconds: 5 Pounce's built-in health check: config = ServerConfig( health_check_path=&quot;/health&quot;, # Automatic 200 OK endpoint ) 4. Complete Example apiVersion: apps/v1 kind: Deployment metadata: name: pounce-app spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 maxSurge: 1 template: metadata: labels: app: pounce-app spec: containers: - name: app image: mycompany/myapp:v1.2.3 ports: - containerPort: 8000 name: http env: - name: SHUTDOWN_TIMEOUT value: &quot;30&quot; readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 5 periodSeconds: 3 failureThreshold: 2 livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 15 periodSeconds: 10 lifecycle: preStop: exec: command: [&quot;sh&quot;, &quot;-c&quot;, &quot;sleep 5&quot;] resources: requests: memory: &quot;256Mi&quot; cpu: &quot;100m&quot; limits: memory: &quot;512Mi&quot; cpu: &quot;500m&quot; terminationGracePeriodSeconds: 40 --- apiVersion: v1 kind: Service metadata: name: pounce-app spec: type: ClusterIP ports: - port: 80 targetPort: 8000 selector: app: pounce-app Docker Best Practices 1. Handle Signals Properly Use exec form of CMD to ensure signals reach pounce: # ✅ Good - signals reach pounce directly CMD [&quot;pounce&quot;, &quot;myapp:app&quot;] # ❌ Bad - shell doesn&#x27;t forward signals CMD pounce myapp:app 2. Run as Non-Root FROM python:3.14-slim RUN useradd -m -u 1000 app USER app # Install deps WORKDIR /app COPY --chown=app:app . . RUN pip install --user pounce # Run server CMD [&quot;pounce&quot;, &quot;myapp:app&quot;, &quot;--host&quot;, &quot;0.0.0.0&quot;] 3. Health Checks HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 AWS ECS / Fargate Task Definition { &quot;family&quot;: &quot;pounce-app&quot;, &quot;containerDefinitions&quot;: [ { &quot;name&quot;: &quot;app&quot;, &quot;image&quot;: &quot;mycompany/myapp:latest&quot;, &quot;portMappings&quot;: [ { &quot;containerPort&quot;: 8000, &quot;protocol&quot;: &quot;tcp&quot; } ], &quot;environment&quot;: [ { &quot;name&quot;: &quot;SHUTDOWN_TIMEOUT&quot;, &quot;value&quot;: &quot;30&quot; } ], &quot;healthCheck&quot;: { &quot;command&quot;: [&quot;CMD-SHELL&quot;, &quot;curl -f http://localhost:8000/health || exit 1&quot;], &quot;interval&quot;: 30, &quot;timeout&quot;: 5, &quot;retries&quot;: 3, &quot;startPeriod&quot;: 60 }, &quot;stopTimeout&quot;: 40 } ] } Key setting: stopTimeout should be greater than shutdown_timeout. Systemd Service [Unit] Description=Pounce ASGI Server After=network.target [Service] Type=notify User=www-data Group=www-data WorkingDirectory=/var/www/myapp Environment=&quot;PATH=/var/www/myapp/venv/bin&quot; ExecStart=/var/www/myapp/venv/bin/pounce myapp:app --host 0.0.0.0 --port 8000 Restart=on-failure RestartSec=5s # Graceful shutdown KillMode=mixed KillSignal=SIGTERM TimeoutStopSec=40s [Install] WantedBy=multi-user.target Key settings: KillSignal=SIGTERM: Proper shutdown signal TimeoutStopSec: Greater than shutdown_timeout KillMode=mixed: SIGTERM to main process, SIGKILL to children after timeout Monitoring Shutdown Logs Pounce logs the complete shutdown sequence: [2026-02-12 10:15:30] Received SIGTERM — initiating shutdown [2026-02-12 10:15:30] Shutting down 4 worker(s)... [2026-02-12 10:15:30] Worker 1 draining 2 active connection(s)... [2026-02-12 10:15:30] Worker 2 shutting down (no active connections) [2026-02-12 10:15:32] Worker 2 stopped [2026-02-12 10:15:33] Worker 1 stopped [2026-02-12 10:15:33] All workers stopped Metrics (with OpenTelemetry) Track drain metrics in your observability platform: from pounce import ServerConfig config = ServerConfig( otel_endpoint=&quot;http://localhost:4318&quot;, otel_service_name=&quot;myapp&quot;, shutdown_timeout=30.0, ) See OpenTelemetry Integration for details. Troubleshooting Workers Not Draining in Time Symptom: Logs show force termination: Worker 1 did not stop within shutdown_timeout (30.0s) — force terminating Solutions: Increase timeout: config = ServerConfig(shutdown_timeout=60.0) Reduce request duration: Optimize slow endpoints Add request timeout: config = ServerConfig( request_timeout=25.0, # Cancel slow requests shutdown_timeout=30.0, ) Kubernetes SIGKILL Before Drain Complete Symptom: Pods terminated before connections finish Solution: Increase terminationGracePeriodSeconds: spec: terminationGracePeriodSeconds: 60 # Must be &gt; shutdown_timeout Load Balancer Still Routing to Draining Pod Symptom: New requests to shutting-down pods Solution: Add preStop delay: lifecycle: preStop: exec: command: [&quot;sh&quot;, &quot;-c&quot;, &quot;sleep 10&quot;] This allows time for load balancer health checks to fail and de-register the pod. Dropped Connections During Rolling Update Symptom: Intermittent 503 errors during deployments Checklist: ✅ terminationGracePeriodSeconds &gt; shutdown_timeout ✅ Readiness probe configured ✅ preStop delay added (5-10s) ✅ Rolling update strategy configured: strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 # Keep all pods running during update maxSurge: 1 Testing Graceful Shutdown Local Testing # Start server pounce myapp:app &amp; PID=$! # Make a long-running request in background curl http://localhost:8000/slow-endpoint &amp; # Send SIGTERM kill -TERM $PID # Watch logs - should see: # &quot;Worker 1 draining 1 active connection(s)...&quot; # &quot;Worker 1 stopped&quot; Load Testing Use wrk or hey to test under load: # Terminal 1: Start server pounce myapp:app # Terminal 2: Generate load hey -z 60s http://localhost:8000/ # Terminal 3: Send SIGTERM after 10s sleep 10 &amp;&amp; pkill -TERM pounce # Check: No connection errors in hey output Comparison with Other Servers Server SIGTERM Handling Connection Draining Force Timeout Kubernetes-Ready pounce ✅ Built-in ✅ Automatic ✅ Configurable ✅ Yes Uvicorn ✅ Built-in ⚠️ Basic ❌ No timeout ⚠️ Partial Gunicorn ✅ Built-in ⚠️ Basic ✅ Fixed (30s) ⚠️ Partial Hypercorn ✅ Built-in ✅ Good ✅ Configurable ✅ Yes Pounce provides production-grade shutdown with clear logging, configurable timeouts, and Kubernetes best practices out of the box. See Also Health Checks — Built-in /health endpoint OpenTelemetry — Monitor shutdown metrics Production Deployment — Complete production guide Graceful Reload — Zero-downtime code updates

--------------------------------------------------------------------------------

Metadata:
- Word Count: 1204
- Reading Time: 6 minutes