I built Bengal because I wanted a static site generator that could actually use all my cores.
That is the practical promise of Bengal: on free-threaded Python, a bigger site does not have to mean a slower editing loop.
Python SSGs have a reputation: fast enough for small sites, but once you scale past a few hundred pages, build times crawl. The usual culprit is the GIL. On traditional Python builds, threads do not give you real parallelism for CPU-bound work like rendering Markdown or compiling templates.
Bengal takes a different path. It is designed for free-threaded Python 3.14t, and it sits at the top of a stack of six pure-Python libraries that all target nogil.
Every library in this diagram declares _Py_mod_gil = 0. This blog is built and served by this stack.
Series context
Part 1 of 6 — Free-Threading in the Bengal Ecosystem. Each post covers one library and the threading patterns it uses.
- Part 1: Bengal — Parallel rendering, immutable snapshots (you are here)
- Part 2: Kida — Copy-on-write, immutable AST, ContextVar
- Part 3: Patitas — O(n) lexer, parallel parsing
- Part 4: Rosettes — Local-only state, frozen lookup tables
- Part 5: Pounce — Thread-based workers, shared immutable config
- Part 6: Chirp — Double-check freeze, ContextVar request isolation
Run it
uv python install 3.14t
uv run --python=3.14t bengal build
Bengal detects free-threading at runtime and uses ThreadPoolExecutor when available. The important part for the reader is that you do not need a separate "parallel mode" API. The same commands work either way.
Performance
On free-threaded Python, Bengal uses ThreadPoolExecutor for parallel page rendering. Build time scales with worker count — more cores mean faster full builds.
The bigger day-to-day win is incremental builds: a single-page change rebuilds in sub-second time. That is the difference between "wait for the build" and "barely notice it happened."
Detecting free-threading at runtime
Bengal doesn't assume free-threading — it checks:
def is_free_threaded() -> bool:
if hasattr(sys, "_is_gil_enabled"):
try:
return not sys._is_gil_enabled()
except (AttributeError, TypeError):
pass
try:
import sysconfig
return sysconfig.get_config_var("Py_GIL_DISABLED") == 1
except (ImportError, AttributeError):
pass
return False
When this returns True, Bengal spins up a ThreadPoolExecutor for page rendering. In plain English: Bengal asks the runtime what world it is in, then uses the same architecture with more parallelism when that world allows it.
Immutable snapshots for lock-free rendering
The trick to parallel rendering is not just spawning threads. It is keeping locks out of the hot path.
After content discovery, Bengal freezes the entire site into immutable dataclasses — PageSnapshot, SectionSnapshot, SiteSnapshot. All navigation trees, taxonomy indexes, and page metadata are pre-computed. During rendering, workers only read from snapshots.
@dataclass(frozen=True, slots=True)
class PageSnapshot:
title: str
href: str
source_path: Path
parsed_html: str
content_hash: str
section: SectionSnapshot | None = None
next_page: PageSnapshot | None = None
prev_page: PageSnapshot | None = None
This eliminated an entire tier of locks. Previously, NavTreeCache and Renderer._cache_lock were acquired during rendering. Now, SiteSnapshot.nav_trees is pre-computed.
The result is simple to describe even if the implementation is not: by the time rendering starts, workers are reading frozen data instead of negotiating over shared mutable structures.
Warning
frozen=True and slots=True — both matter. Without __slots__, Python still allocates a __dict__ per instance. Higher memory, no protection against accidental attribute assignment.
Context propagation into worker threads
ThreadPoolExecutor.submit() does not inherit the calling thread's ContextVar values. Bengal uses contextvars.copy_context().run:
ctx = contextvars.copy_context()
future_to_page = {
executor.submit(ctx.run, process_page_with_pipeline, page): page
for page in batch
}
executor.submit(ctx.run, fn, arg) runs fn(arg) inside a copy of the parent's context. All ContextVar values from the moment copy_context() was called are available in the worker.
Provenance over manual dependencies
Incremental builds in many SSGs become a patchwork of detectors. Which pages depend on this template? Which taxonomy keys does this page invalidate? Which data file affects this section?
Bengal uses content-addressed provenance instead. Each rendered output is hashed with everything that influenced it: source files, templates, cascade data, taxonomy keys. When a file changes, Bengal recomputes provenance and rebuilds only outputs whose provenance changed.
No manual dependency graph. The previous system used ~13 separate dependency detectors. Provenance collapsed them into one model.
What this means in practice
On free-threaded Python 3.14t, Bengal renders hundreds of pages in parallel without GIL contention. On standard Python, the same architecture runs — sequential rendering until you switch interpreters.
The bigger win is probably incremental builds. Change one file. Rebuild only what's affected. 35–80 ms for a single-page change. That's fast enough that you stop noticing the build.
Further reading
- Bengal documentation — full reference including the snapshot model and cache architecture
- Bengal source
- Next in series: Kida — A Template Engine Built for Free-Threaded Python
Related
- Best Python Static Site Generators for 2026 — how Bengal compares to Pelican, MkDocs, and others
- Bengal vs Pelican vs MkDocs — head-to-head comparison
- How to Build a Static Site with Free-Threaded Python — step-by-step tutorial
- How to Migrate a Pelican or MkDocs Site to Bengal — migration guide