Performance

Benchmarks, optimization strategies, and performance characteristics

4 min read 827 words

Measured Performance (2025-10-12)

Python 3.14 Build Rates (recommended):

Pages Full Build Pages/sec Python Incremental Speedup
1,000 3.90s 256 pps 3.14 ~0.5s ~6x
1,000 4.86s 206 pps 3.12 ~0.5s ~10x

Python 3.14t Free-Threading (optional, maximum performance):

Pages Full Build Pages/sec Python Incremental Speedup
1,000 2.68s 373 pps 3.14t ~0.5s ~5x

Legacy Python Build Rates:

Pages Full Build Pages/sec Incremental Speedup
394 3.3s 119 pps 0.18s 18x
1,000 ~10s 100 pps ~0.5s ~20x
10,000 ~100s 100 pps ~2s ~50x

Python 3.14 Performance Impact:

  • 24% speedup over Python 3.12 (256 pps vs 206 pps)
  • Better JIT compilation and memory management
  • Production-ready with full ecosystem support

Python 3.14t Free-Threading (optional):

  • 81% speedup over Python 3.12 (373 pps vs 206 pps)
  • True parallel rendering without GIL bottlenecks
  • Requires separate build, some dependencies may not work

Comparison with Other SSGs:

  • Hugo (Go): ~1000 pps — 4x faster (compiled language)
  • Eleventy (Node.js): ~200 pps — Bengal 3.14 is 28% faster
  • Bengal (Python 3.14): ~256 pps — Fastest Python SSG
  • Bengal (Python 3.14t): ~373 pps — With free-threading
  • Jekyll (Ruby): ~50 pps — 5x slower (single-threaded)

Reality Check:

  • Fast enough for 1K-10K page documentation sites
  • Incremental builds are genuinely 15-50x faster
  • Python 3.14 makes Bengal competitive with Node.js SSGs
  • Validated at 1K-10K pages
  • Production-ready with all dependencies working

Current Optimizations

  1. Parallel Processing

    • Pages, assets, and post-processing tasks run concurrently
    • Configurable viabuild.parallelsetting
    • Impact: 2-4x speedup on multi-core systems
  2. Incremental Builds

    • Only rebuild changed files
    • Dependency tracking detects affected pages
    • Impact: 15-50x speedup for single-file changes (validated at 1K-10K pages)
  3. Page Subset Caching (Added 2025-10-12, Completed 2025-10-18)

    • Site.regular_pages- cached content pages
    • Site.generated_pages- cached generated pages
    • Impact: 75% reduction in equality checks (446K → 112K at 400 pages)
    • Status: ✅ All code paths now use cached properties
  4. Smart Thresholds

    • Automatic detection of when parallelism is beneficial
    • Impact: Avoids overhead for small sites
  5. Efficient File I/O

    • Thread-safe concurrent file operations
    • Impact: Minimal wait time for I/O
  6. Build Cache

    • Persists file hashes and dependencies between builds
    • Parsed Markdown AST cached
    • Impact: Enables fast incremental builds
  7. Zstandard Cache Compression (Added 2025-12)

    • All cache files compressed with Zstd (PEP 784)
    • 92-93% size reduction (1.6MB → 100KB)
    • 12-14x compression ratio
    • Impact: 10x faster cache I/O, 16x smaller CI/CD cache transfers
  8. Template Caching (Enhanced 2025-11-01)

    • LRU cache for rendered autodoc templates with intelligent eviction
    • Configurable cache size (default: 1000 entries)
    • Automatic cache statistics and hit rate tracking
    • Impact: Reduces template rendering overhead for repeated documentation builds
  9. Minimal Dependencies

    • Only necessary libraries included
    • Impact: Fast pip install, small footprint

Known Limitations

  1. Python Overhead: Even with optimizations, Python is still 4x slower than compiled Go/Rust
  2. Memory Usage: Loading 10K pages = ~500MB-1GB RAM (Python object overhead)
  3. Parsing Speed: Markdown parsing is 40-50% of build time (already using fastest pure-Python parser)
  4. Python 3.14 Requirement: Requires Python 3.14+ (released October 2024)
  5. Recommended Limit: 10K pages max (validated at 1K-10K)

Future: Free-Threading

Python 3.14t (free-threaded build) can achieve 373 pages/sec (+46% faster), but:

  • Requires separate Python build
  • Some C extensions don't support it yet (e.g., lightningcss)
  • Expected to become default in Python 3.16-3.18 (2027-2029)

When free-threading becomes the default Python build, Bengal will automatically benefit without any code changes.

Potential Future Optimizations

  1. Content Caching: ✅ Already implemented (parsed AST cached)
  2. Batch File I/O: ✅ Already implemented
    • Page rendering: Parallel (ThreadPoolExecutor)
    • Asset processing: Unified Parallel (ThreadPoolExecutor) for CSS & static assets
    • Content discovery: Parallel (ThreadPoolExecutor, 8 workers)
    • Post-processing: Parallel (ThreadPoolExecutor)
  3. Memory-Mapped Reads: For large files (>100KB) - Low priority, marginal gains
  4. Build Profiling: ✅ Already implemented (tests/performance/)
  5. Asset Deduplication: Share common assets across pages (if needed)

Performance Audit (2025-10-18)

Comprehensive code audit revealed:

  • ✅ No O(n²) patterns in codebase
  • ✅ All file I/O already parallelized
  • ✅ Proper use of sets for O(1) membership checks
  • ✅ Dict-based indexes for O(1) lookups
  • ✅ Page caching complete across all code paths

Current bottlenecks are CPU-bound, not I/O-bound:

  1. Markdown parsing (40-50% of build time) - already using fastest pure-Python parser
  2. Template rendering (30-40% of build time) - already parallel + cached
  3. No remaining algorithmic inefficiencies found

The codebase demonstrates excellent performance engineering with no obvious optimization opportunities remaining.