# Large Site Optimization URL: /bengal/docs/building/performance/large-sites/ Section: performance Description: Build and render 5K-100K+ pages efficiently with streaming, parallel processing, and query indexes --- Bengal is designed for sites with thousands of pages. This guide covers strategies for sites beyond 5,000 pages. ## Quick Start For sites with 5K+ pages: ```bash # Memory-optimized build bengal build --memory-optimized --fast # Full incremental + parallel + fast bengal build --incremental --fast ``` --- ## Strategy Overview | Site Size | Recommended Strategy | Build Time | |-----------|---------------------|------------| | <500 pages | Default (no changes needed) | 1-3s | | 500-5K pages | Default (parallel + incremental enabled) | 3-15s | | 5K-20K pages | `--memory-optimized` | 15-60s | | 20K+ pages | Full optimization stack | 1-5min | --- ## 1. Memory-Optimized Builds (Streaming Mode) For sites with 5K+ pages, enable streaming mode: ```bash bengal build --memory-optimized ``` ### How It Works 1. **Builds knowledge graph** to understand page connectivity 2. **Renders hubs first** (highly connected pages) and keeps them in memory 3. **Streams leaves** in batches and releases memory immediately 4. **Result**: 80-90% memory reduction ### When to Use - Sites with 5K+ pages - CI runners with limited memory - Docker containers with memory limits - Local machines with limited RAM :::{warning} `--memory-optimized` and `--perf-profile` cannot be used together (profiler doesn't work with batched rendering). ::: --- ## 2. Query Indexes (O(1) Lookups) Replace O(n) page filtering with O(1) index lookups in templates. ### The Problem ```kida {# O(n) - scans ALL pages on every request #} {% let blog_posts = site.pages | where('section', 'blog') %} ``` On a 10K page site, this filter runs 10,000 comparisons. ### The Solution ```kida {# O(1) - instant hash lookup #} {% let blog_posts = site.indexes.section.get('blog') | resolve_pages %} ``` ### Built-in Indexes | Index | Key Type | Example | |-------|----------|---------| | `section` | Section name | `site.indexes.section.get('blog')` | | `author` | Author name | `site.indexes.author.get('Jane')` | | `category` | Category | `site.indexes.category.get('tutorial')` | | `date_range` | Year or Year-Month | `site.indexes.date_range.get('2024')` | ### Usage Examples **Section-based listing:** ```kida {% let blog_posts = site.indexes.section.get('blog') | resolve_pages %} {% for post in blog_posts | sort_by('date', reverse=true) %}

{% end %} ``` **Author archive:** ```kida {% let author_posts = site.indexes.author.get('Jane Smith') | resolve_pages %}

{{ author_posts | length }} posts by Jane

``` **Monthly archives:** ```kida {% let jan_posts = site.indexes.date_range.get('2024-01') | resolve_pages %} {% for post in jan_posts %} {{ post.title }} {% end %} ``` ### Performance Impact | Pages | O(n) Filter | Query Index | |-------|-------------|-------------| | 1K | 2ms | <0.1ms | | 10K | 20ms | <0.1ms | | 100K | 200ms | <0.1ms | --- ## 3. Parallel Processing Parallel processing is **auto-detected** based on page count and workload. Adjust worker count if needed: ```toml # bengal.toml [build] max_workers = 8 # Optional: adjust based on CPU cores (auto-detected if omitted) ``` To force sequential processing (useful for debugging): ```bash bengal build --no-parallel ``` ### Free-Threaded Python Bengal automatically detects Python 3.14t+ (free-threaded): ```bash # 1.5-2x faster rendering # Install free-threaded Python: pyenv install 3.14t python3.14t -m pip install bengal ``` See [[docs/about/free-threading|Free-Threading]] for architecture details and code references. When running on free-threaded Python: - ThreadPoolExecutor gets true parallelism (no GIL contention) - ~1.78x faster rendering on multi-core machines - No code changes needed --- ## 4. Incremental Builds Incremental builds are **automatic** — no configuration needed. First build is full, subsequent builds only rebuild changed content. Force a full rebuild if needed: ```bash # Force full rebuild (skip cache) bengal build --no-incremental ``` ### What Gets Cached - **Content parsing** — Markdown AST cached per file - **Template rendering** — Output cached by content hash - **Asset hashing** — Fingerprints cached - **Query indexes** — Updated incrementally - **Autodoc AST parsing** — Python modules cached to skip AST parsing (30-40% speedup for autodoc-heavy sites) - **Asset dependencies** — Tracked during render-time (no HTML parsing needed) ### Cache Location ```tree .bengal/ ├── cache.json.zst # Main build cache (compressed) ├── page_metadata.json.zst # Page discovery cache ├── taxonomy_index.json.zst # Taxonomy index ├── indexes/ # Query indexes (section, author, etc.) ├── templates/ # Template bytecode cache └── logs/ # Build logs ``` ### Clear Cache ```bash # Clear all caches (forces cold rebuild) bengal clean --cache # Clear output and cache bengal clean --all ``` --- ## 5. Fast Mode Combine all optimizations for maximum speed: ```bash bengal build --fast ``` `--fast` enables: - Quiet output (minimal console I/O) - Suppresses verbose logging - Parallelism auto-detected as normal - **Skips HTML formatting** (raw HTML output, ~10-15% faster) :::{note} Fast mode skips HTML pretty-printing and minification. Output is still valid HTML but not formatted. Use for development and CI builds where formatting doesn't matter. ::: --- ## 6. Build Profiling Identify bottlenecks: ```bash # Generate performance profile bengal build --perf-profile # View results python -m pstats .bengal/profiles/profile.stats ``` ### Template Profiling Find slow templates: ```bash bengal build --profile-templates ``` **Output:** ```text Template Rendering Times: layouts/blog.html: 1.2s (340 pages, 3.5ms avg) layouts/docs.html: 0.8s (890 pages, 0.9ms avg) partials/nav.html: 0.3s (included 1230 times) ``` --- ## 7. Content Organization ### Split Large Sections If one section has 5K+ pages, consider splitting: ```tree content/ ├── blog/ │ ├── 2024/ # 500 pages │ ├── 2023/ # 800 pages │ └── archive/ # 3000+ pages (separate pagination) ``` ### Use Pagination Don't render 1000 items on one page: ```yaml # Paginate blog listing pagination: per_page: 20 ``` ### Lazy-Load Heavy Content Move rarely-accessed content to separate pages: ```kida {# Don't: render full changelog inline #} {{ include('changelog.html') }} {# Do: link to separate page #} View full changelog ``` --- ## 8. CI/CD Optimization ### GitHub Actions Example ```yaml jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Cache Bengal uses: actions/cache@v4 with: path: .bengal key: bengal-${{ hashFiles('content/**/*.md') }} - name: Build run: bengal build --fast --environment production ``` ### Docker Memory Limits ```dockerfile # Use memory-optimized for container builds CMD ["bengal", "build", "--memory-optimized", "--fast"] ``` --- ## 9. Monitoring Build Health Track build performance over time: ```bash # Detailed build stats bengal build --verbose ``` **Output:** ```text Build Summary: Total Pages: 15,432 Rendered: 342 (incremental) Skipped: 15,090 (cached) Duration: 12.3s Memory Peak: 245MB Pages/sec: 1,254 ``` --- ## Quick Reference ```bash # Memory-efficient large site build bengal build --memory-optimized --fast # Profile to find bottlenecks bengal build --perf-profile --profile-templates # Force full rebuild bengal build --no-incremental # Clear all caches bengal clean --cache # Clear output and cache bengal clean --all ``` --- ## Troubleshooting ### Build runs out of memory 1. Enable streaming: `--memory-optimized` 2. Use `bengal build --dev --verbose` to see memory usage 3. Increase swap space ### Build is slow despite caching 1. Check what's invalidating cache: `bengal build --verbose` 2. Profile templates: `--profile-templates` 3. Check for O(n) filters in templates (use query indexes) ### Incremental not working 1. Ensure `.bengal/` is not gitignored for local dev 2. Run `bengal clean --cache` to reset 3. Check for template changes that invalidate all pages --- :::{seealso} - [[docs/building/performance|Performance Overview]] - [[docs/reference/cheatsheet|CLI Cheatsheet]] :::