Pounce is built around a simple operational promise: on free-threaded Python, one process can run many worker threads against one in-memory app.
That promise matters because ASGI servers usually make you choose a process model up front. Pounce tries to make the command stay the same and let the runtime decide the worker strategy.
On GIL Python, threads take turns. On Python 3.14t, they run in parallel. One command, one config, two different operating realities depending on your interpreter. Pounce detects the runtime and picks the right worker model automatically.
Series context
Part 5 of 6 — Free-Threading in the Bengal Ecosystem. Pounce is the ASGI server — it runs Chirp apps in production, serving pages built with Kida, Patitas, and Rosettes.
Run it
uv python install 3.14t
uv run --python=3.14t pounce myapp:app --workers 4
On Python 3.14t, that means four threads, shared memory, and one app load. On standard Python, it means four processes. Same command either way.
Threads vs processes — automatic
def is_gil_enabled() -> bool:
return getattr(sys, "_is_gil_enabled", lambda: True)()
def detect_worker_mode() -> WorkerMode:
return "process" if is_gil_enabled() else "thread"
The important part is that the request flow does not split into two codebases. Same Worker class, same ServerConfig, same request flow. Only the spawning mechanism differs.
- One process, shared memory
- One copy of the app loaded
- Lower RSS (~60–80 MB for 4 workers)
- No IPC needed for shared state
- Graceful rolling restart available
- N processes, isolated memory
- App loaded N times
- Higher RSS (~100–150 MB for 4 workers)
- IPC needed for any shared state
- Brief-downtime restart only
Shared immutable config
Workers need config: host, port, timeouts, limits, compression settings. Mutating a shared dict from multiple threads is a race, so Pounce makes config immutable instead:
@dataclass(frozen=True, slots=True)
class ServerConfig:
"""Immutable server configuration.
Created once at startup, shared across all worker threads.
"""
host: str = "127.0.0.1"
port: int = 8000
workers: int = 1
keep_alive_timeout: float = 5.0
request_timeout: float = 30.0
compression: bool = True
# ... 30+ fields, all immutable
Created once at startup, passed to every worker, and never mutated. That removes an entire category of lock and coordination problems.
Per-request compressors
Compression such as gzip and zstd requires stateful compressor objects. Sharing one across requests would be a race, so Pounce creates a fresh compressor per request:
class GzipCompressor:
def __init__(self, *, level: int = 6) -> None:
self._compressor = zlib.compressobj(level, zlib.DEFLATED, 31)
The cost of creating a compressor is small compared to the lifetime of a request. The alternative, locking around a shared compressor, would serialize compression across workers.
The Brotli exclusion
Warning
Pounce supports zstd (stdlib, PEP 784) and gzip (stdlib zlib). Brotli is intentionally excluded — the brotli C extension re-enables the GIL on Python 3.14t. Using it in a free-threaded server would serialize all worker threads whenever any thread compresses a response. Clients that send only Accept-Encoding: br receive uncompressed responses.
This is the free-threading ecosystem in miniature: "has wheels" and "works correctly under contention" are different bars. Audit your C extensions. Prefer stdlib or verified free-threading-safe libraries.
Graceful reload
In thread mode, Pounce supports zero-downtime rolling restart:
- Spawn new workers
- Mark old workers for draining (finish existing connections, reject new)
- Wait for old workers to become idle
- Shut down old workers
This works because threads share memory, so the supervisor can signal workers directly. In process mode, workers run in separate address spaces, so Pounce falls back to brief-downtime restart.
That makes graceful reload a concrete operational benefit of the thread-based model, not just an architectural nicety.
What this means in practice
On free-threaded Python 3.14t, pounce myapp:app --workers 4 runs four threads sharing one interpreter. One app load. Shared immutable config. No fork, no IPC. Compression uses stdlib only.
On standard Python, the same command runs four processes. Same behavior, higher memory, and no rolling restart. Upgrade to free-threaded Python and you get the thread-mode benefits without changing your deployment command.
Further reading
- Pounce documentation — full reference including lifecycle hooks and observability events
- Pounce source
- Next in series: Chirp — A Web Framework Built for Free-Threaded Python
Related
- The Python Free-Threading Ecosystem in 2026 — who's ready for NoGIL
- Chirp vs Flask vs FastAPI — when free-threading matters for web frameworks