Bengal SSG — Built for Python's Free-Threaded Future

How Bengal achieves parallel page rendering and lock-free incremental builds on Python 3.14t — architecture patterns and the design decisions behind them.

I built Bengal because I wanted a static site generator that could actually use all my cores.

That is the practical promise of Bengal: on free-threaded Python, a bigger site does not have to mean a slower editing loop.

Python SSGs have a reputation: fast enough for small sites, but once you scale past a few hundred pages, build times crawl. The usual culprit is the GIL. On traditional Python builds, threads do not give you real parallelism for CPU-bound work like rendering Markdown or compiling templates.

Bengal takes a different path. It is designed for free-threaded Python 3.14t, and it sits at the top of a stack of six pure-Python libraries that all target nogil.

flowchart TB Bengal["Bengal — Static Site Generator"] Chirp["Chirp — Web Framework"] Pounce["Pounce — ASGI Server"] Kida["Kida — Template Engine"] Patitas["Patitas — Markdown Parser"] Rosettes["Rosettes — Syntax Highlighter"] Bengal --> Kida Bengal --> Patitas Chirp --> Pounce Chirp --> Kida Patitas --> Rosettes

Every library in this diagram declares _Py_mod_gil = 0. This blog is built and served by this stack.


Series context

Part 1 of 6Free-Threading in the Bengal Ecosystem. Each post covers one library and the threading patterns it uses.

  • Part 1: Bengal — Parallel rendering, immutable snapshots (you are here)
  • Part 2: Kida — Copy-on-write, immutable AST, ContextVar
  • Part 3: Patitas — O(n) lexer, parallel parsing
  • Part 4: Rosettes — Local-only state, frozen lookup tables
  • Part 5: Pounce — Thread-based workers, shared immutable config
  • Part 6: Chirp — Double-check freeze, ContextVar request isolation

Run it

uv python install 3.14t
uv run --python=3.14t bengal build

Bengal detects free-threading at runtime and uses ThreadPoolExecutor when available. The important part for the reader is that you do not need a separate "parallel mode" API. The same commands work either way.


Performance

On free-threaded Python, Bengal uses ThreadPoolExecutor for parallel page rendering. Build time scales with worker count — more cores mean faster full builds.

The bigger day-to-day win is incremental builds: a single-page change rebuilds in sub-second time. That is the difference between "wait for the build" and "barely notice it happened."


Detecting free-threading at runtime

Bengal doesn't assume free-threading — it checks:

def is_free_threaded() -> bool:
    if hasattr(sys, "_is_gil_enabled"):
        try:
            return not sys._is_gil_enabled()
        except (AttributeError, TypeError):
            pass
    try:
        import sysconfig
        return sysconfig.get_config_var("Py_GIL_DISABLED") == 1
    except (ImportError, AttributeError):
        pass
    return False

When this returns True, Bengal spins up a ThreadPoolExecutor for page rendering. In plain English: Bengal asks the runtime what world it is in, then uses the same architecture with more parallelism when that world allows it.


Immutable snapshots for lock-free rendering

The trick to parallel rendering is not just spawning threads. It is keeping locks out of the hot path.

After content discovery, Bengal freezes the entire site into immutable dataclasses — PageSnapshot, SectionSnapshot, SiteSnapshot. All navigation trees, taxonomy indexes, and page metadata are pre-computed. During rendering, workers only read from snapshots.

@dataclass(frozen=True, slots=True)
class PageSnapshot:
    title: str
    href: str
    source_path: Path
    parsed_html: str
    content_hash: str
    section: SectionSnapshot | None = None
    next_page: PageSnapshot | None = None
    prev_page: PageSnapshot | None = None
flowchart LR subgraph Discovery["Content Discovery"] D[Discover Pages] --> B[Build Indexes] end subgraph Freeze["Snapshot Builder"] B --> PS[PageSnapshot] B --> SS[SectionSnapshot] B --> SI[SiteSnapshot] end subgraph Render["Parallel Rendering — Lock Free"] PS --> W1["Worker 1"] PS --> W2["Worker 2"] PS --> WN["Worker N"] end W1 --> O["public/"] W2 --> O WN --> O

This eliminated an entire tier of locks. Previously, NavTreeCache and Renderer._cache_lock were acquired during rendering. Now, SiteSnapshot.nav_trees is pre-computed.

The result is simple to describe even if the implementation is not: by the time rendering starts, workers are reading frozen data instead of negotiating over shared mutable structures.

Warning

frozen=True and slots=True — both matter. Without __slots__, Python still allocates a __dict__ per instance. Higher memory, no protection against accidental attribute assignment.


Context propagation into worker threads

ThreadPoolExecutor.submit() does not inherit the calling thread's ContextVar values. Bengal uses contextvars.copy_context().run:

ctx = contextvars.copy_context()
future_to_page = {
    executor.submit(ctx.run, process_page_with_pipeline, page): page
    for page in batch
}

executor.submit(ctx.run, fn, arg) runs fn(arg) inside a copy of the parent's context. All ContextVar values from the moment copy_context() was called are available in the worker.


Provenance over manual dependencies

Incremental builds in many SSGs become a patchwork of detectors. Which pages depend on this template? Which taxonomy keys does this page invalidate? Which data file affects this section?

Bengal uses content-addressed provenance instead. Each rendered output is hashed with everything that influenced it: source files, templates, cascade data, taxonomy keys. When a file changes, Bengal recomputes provenance and rebuilds only outputs whose provenance changed.

flowchart LR S["Source Hash"] --> P["Provenance Record"] T["Template Hash"] --> P C["Cascade Hash"] --> P D["Data Hash"] --> P P --> Check{Changed?} Check -->|Yes| Rebuild["Rebuild Page"] Check -->|No| Skip["Use Cache"]

No manual dependency graph. The previous system used ~13 separate dependency detectors. Provenance collapsed them into one model.


What this means in practice

On free-threaded Python 3.14t, Bengal renders hundreds of pages in parallel without GIL contention. On standard Python, the same architecture runs — sequential rendering until you switch interpreters.

The bigger win is probably incremental builds. Change one file. Rebuild only what's affected. 35–80 ms for a single-page change. That's fast enough that you stop noticing the build.


Further reading