Build Pipeline

This guide explains what happens when you runbengal build.

Bengal's build system is orchestrated by theBuildOrchestratorand executes in distinct phases.

The Phases

1. Initialization

Config Loading: Bengal loadsbengal.toml(orconfig/directory) and establishes the environment context.
Cache Loading: If incremental build is enabled, the previous build cache (.bengal/cache.json) is loaded.

2. Content Discovery (`ContentOrchestrator`)

Scanning: Thecontent/directory is scanned recursively.
Page Creation:Pageobjects are created for each markdown file.
Schema Validation: Ifcollections.pyexists, frontmatter is validated against defined schemas (see Content Collections).
Lazy Loading: In incremental builds, unchanged pages are loaded asPageProxyobjects (lightweight metadata only), saving parsing time.
Section Registry: A path-based registry is built for O(1) section lookups.

3. Structure & Metadata

Section Finalization: Ensures every folder has a corresponding Section object (creating virtual sections if_index.mdis missing).
Cascading: Metadata from section_index.mdfiles is applied to all descendant pages (e.g.,cascade: type: doc).
URL Generation: Output paths and URLs are computed for all pages.

4. Taxonomy & Menus

Taxonomy Collection: Tags, categories, and other terms are collected from all pages.
Menu Generation: Navigation menus are built from config and page frontmatter.
Incremental Optimization: Only changed pages are re-scanned for taxonomy updates.

5. Asset Processing (`AssetOrchestrator`)

Discovery: Assets are found inassets/and theme directories.
Processing: SCSS is compiled, JS is minified, and images are optimized (if pipelines are enabled).
Fingerprinting: Hashes are generated for cache busting (e.g.,style.a1b2c3.css).

6. Rendering (`RenderOrchestrator`)

This is the heavy lifting phase.

Parallel Execution: Pages are rendered in parallel usingThreadPoolExecutor.
- Bengal supports Free-Threaded Python (3.13t+), allowing true parallelism without the GIL.
Jinja2 Context: Each page is rendered with thesiteandpagecontext.
Markdown Parsing: Markdown content is converted to HTML (cached by file hash).

7. Post-Processing

Sitemap:sitemap.xmlis generated.
RSS: RSS feeds are built.
Validation: Internal links are checked (if--strictis enabled).

Incremental Builds

Bengal's incremental build system relies on Change Detection and Dependency Tracking.

How it Works

Change Detection: Files are hashed. Ifcontent/post.mdhasn't changed, its hash matches the cache.
Smart Filtering: The orchestrator calculates apages_to_buildlist.
Dependency Graph:
- Direct Change: The file itself changed.
- Navigation Dependency: If Page A links to Page B, and Page B changes title, Page A must rebuild (to update the link text).
- Template Change: Iftemplates/page.htmlchanges, all pages using that template rebuild.
- Config Change: Ifbengal.tomlchanges, a Full Rebuild is triggered.

Performance

Cached Pages: ~0ms (Metadata loaded from JSON).
Rendered Pages: ~10-50ms per page (depending on complexity).
Parallelism: Scales linearly with CPU cores on Python 3.13t+.

Memory Optimization

For massive sites (>10,000 pages), Bengal offers a--memory-optimizedflag. This uses a Streaming Orchestrator to process pages in batches, keeping memory usage constant rather than linear to site size.

Reactive Dataflow Pipeline

Bengal also provides a Reactive Dataflow Pipeline for declarative, stream-based builds:

from bengal.pipeline import Pipeline

pipeline = (
    Pipeline("build")
    .source("files", discover_files)
    .map("parse", parse_markdown)
    .parallel(workers=4)
    .for_each("write", write_output)
)

result = pipeline.run()

Key benefits:

Declarative: Define what, not how
Automatic Caching: Version-based cache invalidation
Watch Mode: Built-in file watching with debouncing
Composable: Chain operations fluently

See Reactive Pipeline Architecture for details.