Server Lifecycle

Graceful reload, hot deploy, and shutdown with connection draining

2 min read 486 words

Pounce handles three lifecycle signals for zero-downtime operations: SIGHUP (reload), SIGUSR1 (hot deploy), and SIGTERM (shutdown). All three use the same drain-then-replace pattern.

Graceful Reload (SIGHUP)

Send SIGHUP to perform a rolling restart with fresh code:

kill -HUP <pid>
# or with systemd:
systemctl reload pounce

What Happens

  1. Old workers continue handling existing requests
  2. App code is reimported and new workers spawn (generation N+1)
  3. Old workers enter drain mode (finish active requests, reject new ones)
  4. Once drained (or afterreload_timeout), old workers exit
Time 0s:   [Worker-0] [Worker-1] [Worker-2] [Worker-3]  (Gen 0)
           SIGHUP received
Time 0.1s: [Worker-0..3 draining] [Worker-4..7 accepting]  (Gen 0+1)
Time 5s:   [Worker-4] [Worker-5] [Worker-6] [Worker-7]  (Gen 1 only)

If the reimport fails, pounce logs the error and continues with the old code -- no downtime from bad deploys.

Configuration

config = ServerConfig(
    reload_timeout=60.0,  # Max drain time (default: 30s)
    workers=4,
)

systemd

[Service]
Type=notify
ExecStart=/usr/bin/pounce serve myapp:app --workers=4
ExecReload=/bin/kill -HUP $MAINPID

Hot Deploy (SIGUSR1)

SIGUSR1 triggers the same rolling restart as SIGHUP. Use whichever signal fits your deployment tooling.

kill -SIGUSR1 <pid>

On Linux with SO_REUSEPORT, old and new workers bind to the same port simultaneously. On macOS/Windows (no SO_REUSEPORT), the AcceptDistributor handles the handoff via a shared queue.

File Watching (Development)

For development, enable auto-reload on file changes:

config = ServerConfig(
    reload=True,
    reload_include=(".html", ".css"),  # Extra extensions
    reload_dirs=("templates",),        # Extra directories
)

Graceful Shutdown (SIGTERM)

On SIGTERM or SIGINT, pounce drains connections then exits:

  1. Stops accepting new connections immediately
  2. Finishes active requests (up toshutdown_timeout)
  3. Force-terminates workers that exceed the timeout
  4. Exits with status 0
config = ServerConfig(
    shutdown_timeout=30.0,  # Per-worker drain time (default: 10s)
)

Kubernetes

spec:
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          command: ["sh", "-c", "sleep 5"]  # LB de-registration delay
    readinessProbe:
      httpGet:
        path: /health
        port: 8000
  terminationGracePeriodSeconds: 40  # > shutdown_timeout + preStop

Key: terminationGracePeriodSeconds must exceed shutdown_timeout+ preStop delay, or Kubernetes sends SIGKILL before drain completes.

Docker

Use exec form so signals reach pounce directly:

CMD ["pounce", "serve", "myapp:app", "--host", "0.0.0.0"]

systemd

[Service]
Type=notify
KillSignal=SIGTERM
KillMode=mixed
TimeoutStopSec=40s

Thread Mode vs Process Mode

Thread Mode (3.14t) Process Mode (GIL)
Reload True zero-downtime (old + new overlap) Brief downtime (~100-500ms)
Shutdown Drain per-thread Drain per-process
Recommendation Production Acceptable for dev

Thread mode requires Python 3.14t (free-threading). Process mode falls back to stop-all-then-start.

Troubleshooting

Workers not draining: Increasereload_timeout or shutdown_timeout. Check for long-lived connections (WebSocket, streaming). Set request_timeoutto cap individual requests.

SIGKILL before drain complete (Kubernetes): IncreaseterminationGracePeriodSeconds to exceed shutdown_timeout+ preStop delay.

Module reload failures: Pounce logs the import error and continues with the previous version.