Graceful Reload

Pounce supports zero-downtime code reloads via SIGHUP signal, enabling you to deploy new code without dropping any in-flight requests.

Overview

When you send SIGHUP to a running pounce server, it performs a rolling restart:

Keep serving: Old workers continue handling existing requests
Spawn new generation: New workers start with fresh code
Drain old workers: Old workers finish current requests, reject new ones
Seamless handoff: Once drained, old workers shut down

Zero requests dropped. No connection refused errors.

Basic Usage

Send SIGHUP Signal

# Find the pounce process ID
ps aux | grep pounce

# Send SIGHUP to trigger reload
kill -HUP <pid>

With systemd

# Reload the service
systemctl reload pounce

Your pounce.service file should use ExecReload:

[Service]
Type=notify
ExecStart=/usr/bin/pounce myapp:app --workers=4
ExecReload=/bin/kill -HUP $MAINPID

With Supervisor

[program:pounce]
command=/usr/bin/pounce myapp:app --workers=4
autorestart=true
killasgroup=true

# Reload via supervisor
supervisorctl signal HUP pounce

Configuration

Control drain timeout withreload_timeout(default: 30 seconds):

from pounce import ServerConfig

config = ServerConfig(
    reload_timeout=60.0,  # Allow up to 60s for workers to drain
    workers=4,
)

If workers haven't drained after reload_timeout, they are force-stopped.

Thread Mode vs Process Mode

Thread Mode (Python 3.14t, nogil) — Recommended

Zero-downtime rolling restart fully supported:

config = ServerConfig(workers=4)  # Uses threads on nogil Python

Old and new workers run simultaneously
True zero-downtime reload
Automatic code reimport

Process Mode (GIL builds) — Limited

Falls back to hard restart (brief downtime):

All workers stop before new ones start
~100-500ms of downtime depending on drain speed
Still safer than kill+restart

Recommendation: Use thread mode (Python 3.14t) for production zero-downtime reloads.

How It Works

Rolling Restart Flow

Time 0s:   [Worker-0] [Worker-1] [Worker-2] [Worker-3]  (Generation 0)
           ↓ SIGHUP received

Time 0.1s: [Worker-0] [Worker-1] [Worker-2] [Worker-3]  (Gen 0, draining)
           [Worker-4] [Worker-5] [Worker-6] [Worker-7]  (Gen 1, accepting)

Time 2s:   Worker-0, Worker-1 finish requests and exit
           [Worker-2] [Worker-3]  (Gen 0, draining)
           [Worker-4] [Worker-5] [Worker-6] [Worker-7]  (Gen 1, accepting)

Time 5s:   Worker-2, Worker-3 finish and exit
           [Worker-4] [Worker-5] [Worker-6] [Worker-7]  (Gen 1, accepting)
           Reload complete!

Drain Mode

When a worker enters drain mode:

Stops accepting new connections
Finishes all in-flight requests
Waits for active connections to complete
Shuts down once idle (or after timeout)

Deployment Strategies

Blue-Green Style (Zero Risk)

# 1. Deploy new code to new version directory
cp -r /app/v1 /app/v2

# 2. Reload pounce to pick up new code
kill -HUP $(cat /var/run/pounce.pid)

# 3. Old workers drain, new workers serve
# 4. No downtime, no connection errors

Container Deployments (Kubernetes, Docker)

# kubernetes deployment
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # Zero downtime
  containers:
  - name: pounce
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "kill -HUP 1; sleep 35"]

Pounce receives SIGHUP before SIGTERM, allowing graceful drain before pod termination.

Monitoring Reload

Log Output

[INFO] Received SIGHUP — triggering graceful reload
[INFO] Successfully reimported app from myapp:app
[INFO] Spawning 4 new worker(s) (generation 1)...
[INFO] New workers spawned. Draining old workers (generation 0)...
[INFO] Worker 0 (generation 0) is idle
[INFO] Worker 1 (generation 0) is idle
[INFO] Worker 2 (generation 0) is idle
[INFO] Worker 3 (generation 0) is idle
[INFO] Graceful reload complete. Running 4 worker(s) on generation 1

Health Checks

Configure a health endpoint to verify reload success:

from pounce import ServerConfig

config = ServerConfig(
    health_check_path="/health",
    workers=4,
)

After SIGHUP, your load balancer can verify the new generation is healthy before routing traffic.

Troubleshooting

Workers Not Draining

Problem: Workers stay active pastreload_timeout

Cause: Long-running requests (uploads, WebSocket, streaming)

Solution:

Increasereload_timeout:

   ServerConfig(reload_timeout=120.0)  # 2 minutes

Implement graceful WebSocket close in your app:

   # Close WebSocket connections on worker shutdown signal
   @app.on_event("pounce.worker.shutdown")
   async def close_websockets():
       await websocket_manager.close_all()

Module Reload Failures

Problem:ImportError or AttributeErrorafter reload

Cause: Code changes break module imports

Solution: Fix the code error. Pounce will log the exception and continue with the old version:

[ERROR] Reload failed — continuing with previous version

No downtime from bad deploys!

Process Mode Downtime

Problem: Brief (100-500ms) connection errors during reload

Cause: Process mode uses hard restart (all workers stop before new ones start)

Solution: Upgrade to Python 3.14t (nogil) for thread-based zero-downtime reloads.

Best Practices

Use Thread Mode: Python 3.14t with nogil for true zero-downtime
Set Generous Timeout:reload_timeoutshould exceed your longest request
Test Locally: Verify reload works withkill -HUPbefore deploying
Monitor Logs: Watch for "Graceful reload complete" confirmation
Health Checks: Use/healthendpoint to validate new generation
Automate: Integrate SIGHUP into your CI/CD pipeline

Comparison with Other Servers

Server	Zero-Downtime Reload	Method
pounce	Yes (thread mode)	SIGHUP rolling restart
Uvicorn	No	Must use external orchestrator
Gunicorn	Partial (brief downtime)	HUP restarts master process
Hypercorn	No	Manual stop/start required