Build Pipeline

How Bengal orchestrates builds, processes content, and performs incremental builds

3 min read 574 words

This guide explains what happens when you runbengal build.

Bengal's build system is orchestrated by theBuildOrchestratorand executes in distinct phases.

The Phases

1. Initialization

  • Config Loading: Bengal loadsbengal.toml(orconfig/directory) and establishes the environment context.
  • Cache Loading: If incremental build is enabled, the previous build cache (.bengal/cache.json) is loaded.

2. Content Discovery (ContentOrchestrator)

  • Scanning: Thecontent/directory is scanned recursively.
  • Page Creation:Pageobjects are created for each markdown file.
  • Schema Validation: Ifcollections.pyexists, frontmatter is validated against defined schemas (see Content Collections).
  • Lazy Loading: In incremental builds, unchanged pages are loaded asPageProxyobjects (lightweight metadata only), saving parsing time.
  • Section Registry: A path-based registry is built for O(1) section lookups.

3. Structure & Metadata

  • Section Finalization: Ensures every folder has a corresponding Section object (creating virtual sections if_index.mdis missing).
  • Cascading: Metadata from section_index.mdfiles is applied to all descendant pages (e.g.,cascade: type: doc).
  • URL Generation: Output paths and URLs are computed for all pages.

4. Taxonomy & Menus

  • Taxonomy Collection: Tags, categories, and other terms are collected from all pages.
  • Menu Generation: Navigation menus are built from config and page frontmatter.
  • Incremental Optimization: Only changed pages are re-scanned for taxonomy updates.

5. Asset Processing (AssetOrchestrator)

  • Discovery: Assets are found inassets/and theme directories.
  • Processing: SCSS is compiled, JS is minified, and images are optimized (if pipelines are enabled).
  • Fingerprinting: Hashes are generated for cache busting (e.g.,style.a1b2c3.css).

6. Rendering (RenderOrchestrator)

This is the heavy lifting phase.

  • Parallel Execution: Pages are rendered in parallel usingThreadPoolExecutor.
    • Bengal supports Free-Threaded Python (3.13t+), allowing true parallelism without the GIL.
  • Jinja2 Context: Each page is rendered with thesiteandpagecontext.
  • Markdown Parsing: Markdown content is converted to HTML (cached by file hash).

7. Post-Processing

  • Sitemap:sitemap.xmlis generated.
  • RSS: RSS feeds are built.
  • Validation: Internal links are checked (if--strictis enabled).

Incremental Builds

Bengal's incremental build system relies on Change Detection and Dependency Tracking.

How it Works

  1. Change Detection: Files are hashed. Ifcontent/post.mdhasn't changed, its hash matches the cache.
  2. Smart Filtering: The orchestrator calculates apages_to_buildlist.
  3. Dependency Graph:
    • Direct Change: The file itself changed.
    • Navigation Dependency: If Page A links to Page B, and Page B changes title, Page A must rebuild (to update the link text).
    • Template Change: Iftemplates/page.htmlchanges, all pages using that template rebuild.
    • Config Change: Ifbengal.tomlchanges, a Full Rebuild is triggered.

Performance

  • Cached Pages: ~0ms (Metadata loaded from JSON).
  • Rendered Pages: ~10-50ms per page (depending on complexity).
  • Parallelism: Scales linearly with CPU cores on Python 3.13t+.

Memory Optimization

For massive sites (>10,000 pages), Bengal offers a--memory-optimizedflag. This uses a Streaming Orchestrator to process pages in batches, keeping memory usage constant rather than linear to site size.

Reactive Dataflow Pipeline

Bengal also provides a Reactive Dataflow Pipeline for declarative, stream-based builds:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from bengal.pipeline import Pipeline

pipeline = (
    Pipeline("build")
    .source("files", discover_files)
    .map("parse", parse_markdown)
    .parallel(workers=4)
    .for_each("write", write_output)
)

result = pipeline.run()

Key benefits:

  • Declarative: Define what, not how
  • Automatic Caching: Version-based cache invalidation
  • Watch Mode: Built-in file watching with debouncing
  • Composable: Chain operations fluently

See Reactive Pipeline Architecture for details.