Pounce is built for Python 3.14t and the free-threading model. If you are evaluating Python ASGI servers, the main distinction is its worker model: threads on 3.14t, automatic process fallback on GIL builds.
Pounce's Model
- Thread-based parallelism — N worker threads share one interpreter, one copy of your app
- Shared memory — Lower memory footprint than process-based workers
- Streaming-first — Body chunks sent immediately to socket
- Fast-path parsing — Built-in HTTP/1.1 parser on the sync worker hot path
- Thread-worker reload — Rolling restart with generational worker swap on supported 3.14t thread-worker deployments
- Pure Python — One dependency (h11). Debuggable, hackable, readable
- Optional extras — HTTP/2, WebSocket, TLS, HTTP/3 via
bengal-pounce[h2],bengal-pounce[ws],bengal-pounce[tls],bengal-pounce[h3]
When Pounce Fits
- You're on Python 3.14t and want thread-based parallelism
- You want shared memory across workers (lower memory footprint)
- You need streaming responses with minimal latency
- You want stdlib compression (zstd) without external dependencies
- You prefer pure Python for debuggability and extensibility
- You want a Uvicorn-like CLI with a different concurrency model
When to Consider Alternatives
- Uvicorn — Mature ecosystem, C-based HTTP parser (httptools/uvloop), broad production history. Better if you need Python < 3.14, prefer battle-tested stability, or depend on uvloop for I/O performance.
- Granian — Rust-based I/O via Hyper/Tokio with higher raw throughput on simple endpoints (~3x Uvicorn on empty responses). Also supports free-threaded Python since v2.0. Better if you need maximum requests-per-second and don't need HTTP/3, built-in compression, or middleware.
- Hypercorn — Supports HTTP/2 without TLS (h2c) and trio/asyncio backends. Better if you need non-asyncio event loops or cleartext HTTP/2.
- Existing deployments — If your current setup works and you're not on 3.14t, there's no urgent reason to switch.
Competitive Comparison
| Capability | Pounce | Uvicorn | Hypercorn | Granian |
|---|---|---|---|---|
| Free-threading | Native threads on 3.14t | Processes only | Processes only | Rust + processes |
| HTTP/1.1 parser | Fast built-in parser + h11 | h11 or httptools (C) | h11 | Rust (hyper) |
| Config thread-safety | Frozen dataclass | Mutable | Mutable | N/A (Rust) |
| Thread-worker reload | Rolling restart on supported 3.14t deployments | Full restart | Full restart | N/A |
| Thundering herd fix | AcceptDistributor | N/A | N/A | N/A |
| Built-in metrics | Prometheus /metrics | No | No | No |
| Lifecycle events API | Typed, public | Logging only | Logging only | N/A |
| Rate limiting | Built-in (per-IP) | No | No | No |
| Request queueing | Built-in (load shedding) | No | No | No |
| Pure Python | Yes | Partial (uvloop/httptools) | Yes | No (Rust core) |
Pounce's competitive moat: it treats free-threaded Python as a first-class runtime, not an afterthought. The frozen config model, rolling reload, and AcceptDistributor are capabilities no other Python ASGI server offers.
See Also
- Performance — Benchmarks and design
- Thread Safety — The shared memory model