Server Lifecycle

Graceful reload and shutdown with connection draining

3 min read 535 words

Pounce handles SIGHUP for graceful reload on supported multi-worker paths and SIGTERM / SIGINT for graceful shutdown. Both paths use connection draining: active requests get time to finish, while workers that are leaving service reject new connections.

Graceful Reload (SIGHUP)

On supported multi-worker thread and subinterpreter paths, send SIGHUP to perform a rolling restart with fresh code:

kill -HUP <pid>
# or with systemd:
systemctl reload pounce

What Happens

  1. Old workers continue handling existing requests
  2. App code is reimported and new workers spawn (generation N+1)
  3. Old workers enter drain mode (finish active requests, reject new ones)
  4. Once drained (or afterreload_timeout), old workers exit
Time 0s:   [Worker-0] [Worker-1] [Worker-2] [Worker-3]  (Gen 0)
           SIGHUP received
Time 0.1s: [Worker-0..3 draining] [Worker-4..7 accepting]  (Gen 0+1)
Time 5s:   [Worker-4] [Worker-5] [Worker-6] [Worker-7]  (Gen 1 only)

If the reimport fails, pounce logs the error and continues with the old code instead of swapping to the failed generation.

HTTP/3 uses a separate UDP/QUIC listener. Treat H3 reload/drain as limited until the protocol proof ledger records parity for that path.

Current subprocess proof covers SIGTERM clean exit and SIGHUP recovery to serving traffic. It does not yet prove mixed active-request drain behavior under load, so avoid describing reload as lossless across all modes and protocols.

Configuration

config = ServerConfig(
    reload_timeout=60.0,  # Max drain time (default: 30s)
    workers=4,
)

systemd

[Service]
Type=notify
ExecStart=/usr/bin/pounce serve myapp:app --workers=4
ExecReload=/bin/kill -HUP $MAINPID

File Watching (Development)

For development, enable auto-reload on file changes:

config = ServerConfig(
    reload=True,
    reload_include=(".html", ".css"),  # Extra extensions
    reload_dirs=("templates",),        # Extra directories
)

Graceful Shutdown (SIGTERM)

On SIGTERM or SIGINT, pounce drains connections then exits:

  1. Stops accepting new connections immediately
  2. Finishes active requests (up toshutdown_timeout)
  3. Force-terminates workers that exceed the timeout
  4. Exits with status 0
config = ServerConfig(
    shutdown_timeout=30.0,  # Per-worker drain time (default: 10s)
)

Kubernetes

spec:
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          command: ["sh", "-c", "sleep 5"]  # LB de-registration delay
    readinessProbe:
      httpGet:
        path: /health
        port: 8000
  terminationGracePeriodSeconds: 40  # > shutdown_timeout + preStop

Key: terminationGracePeriodSeconds must exceed shutdown_timeout+ preStop delay, or Kubernetes sends SIGKILL before drain completes.

Docker

Use exec form so signals reach pounce directly:

CMD ["pounce", "serve", "myapp:app", "--host", "0.0.0.0"]

systemd

[Service]
Type=notify
KillSignal=SIGTERM
KillMode=mixed
TimeoutStopSec=40s

Thread Mode vs Process Mode

Thread Mode (3.14t) Process Mode (GIL)
Reload Rolling generation swap with old + new overlap Stop/start fallback may have a brief gap
Shutdown Drain per-thread Drain per-process
Recommendation Production Acceptable for dev

Thread mode requires Python 3.14t (free-threading). Process mode falls back to stop-all-then-start. Subinterpreter reload is explicit and beta-scoped; validate dependency compatibility before relying on it for production deploys.

Troubleshooting

Workers not draining: Increasereload_timeout or shutdown_timeout. Check for long-lived connections (WebSocket, streaming). Set request_timeoutto cap individual requests.

SIGKILL before drain complete (Kubernetes): IncreaseterminationGracePeriodSeconds to exceed shutdown_timeout+ preStop delay.

Module reload failures: Pounce logs the import error and continues with the previous version.