Stop writing API docs by hand. Generate documentation from Python docstrings, CLI help text, and OpenAPI specs—kept in sync automatically.

Tip

Duration: ~60 min | Prerequisite: Python codebase or API spec to document

1

Content Sources

Fetch content from external sources

Remote Content Sources

Fetch content from GitHub, Notion, REST APIs, and more.

Do I Need This?

No. By default, Bengal reads content from local files. That works for most sites.

Use remote sources when:

  • Your docs live in multiple GitHub repos
  • Content lives in a CMS (Notion, Contentful, etc.)
  • You want to pull API docs from a separate service
  • You need to aggregate content from different teams

Quick Start

Install the loader you need:

pip install bengal[github]   # GitHub repositories
pip install bengal[notion]   # Notion databases
pip install bengal[rest]     # REST APIs
pip install bengal[all-sources]  # Everything

Update your collections.py:

from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader

collections = {
    # Local content (default)
    "docs": define_collection(
        schema=DocPage,
        directory="content/docs",
    ),

    # Remote content from GitHub
    "api-docs": define_collection(
        schema=DocPage,
        loader=github_loader(
            repo="myorg/api-docs",
            path="docs/",
        ),
    ),
}

Build as normal. Remote content is fetched, cached, and validated like local content.

Available Loaders

GitHub

Fetch markdown from any GitHub repository:

from bengal.content.sources import github_loader

loader = github_loader(
    repo="owner/repo",       # Required: "owner/repo" format
    branch="main",           # Default: "main"
    path="docs/",            # Default: "" (root)
    token=None,              # Default: uses GITHUB_TOKEN env var
    glob="*.md",             # Default: "*.md" (file pattern to match)
)

For private repos, set GITHUB_TOKEN environment variable or pass tokendirectly.

Notion

Fetch pages from a Notion database:

from bengal.content.sources import notion_loader

loader = notion_loader(
    database_id="abc123...",  # Required: database ID from URL
    token=None,               # Default: uses NOTION_TOKEN env var
    property_mapping={        # Map Notion properties to frontmatter
        "title": "Name",
        "date": "Published",
        "tags": "Tags",
    },
)

Setup:

  1. Create integration at notion.so/my-integrations
  2. Share your database with the integration
  3. SetNOTION_TOKENenvironment variable

REST API

Fetch from any JSON API:

from bengal.content.sources import rest_loader

loader = rest_loader(
    url="https://api.example.com/posts",
    headers={"Authorization": "Bearer ${API_TOKEN}"},  # Env vars expanded
    content_field="body",           # JSON path to content
    id_field="id",                  # JSON path to ID
    frontmatter_fields={            # Map API fields to frontmatter
        "title": "title",
        "date": "published_at",
        "tags": "categories",
    },
)

Local (Explicit)

For consistency, you can also use an explicit local loader:

from bengal.content.sources import local_loader

loader = local_loader(
    directory="content/docs",
    glob="**/*.md",
    exclude=["_drafts/*"],
)

Caching

Remote content is cached locally to avoid repeated API calls:

# Check cache status
bengal sources status

# Force refresh from remote
bengal sources fetch --force

# Clear all cached content
bengal sources clear

Cache behavior:

  • Default TTL: 1 hour
  • Cache directory:.bengal/content_cache/
  • Automatic invalidation when config changes
  • Falls back to cache if remote unavailable

CLI Commands

# List configured content sources
bengal sources list

# Show cache status (age, size, validity)
bengal sources status

# Fetch/refresh from remote sources
bengal sources fetch
bengal sources fetch --source api-docs  # Specific source
bengal sources fetch --force            # Ignore cache

# Clear cached content
bengal sources clear
bengal sources clear --source api-docs

Environment Variables

Variable Used By Description
GITHUB_TOKEN GitHub loader Personal access token for private repos
NOTION_TOKEN Notion loader Integration token
Custom REST loader Any${VAR}in headers is expanded

Multi-Repo Documentation

A common pattern for large organizations:

from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader, local_loader

collections = {
    # Main docs (local)
    "docs": define_collection(
        schema=DocPage,
        directory="content/docs",
    ),

    # API reference (from API team's repo)
    "api": define_collection(
        schema=DocPage,
        loader=github_loader(repo="myorg/api-service", path="docs/"),
    ),

    # SDK docs (from SDK repo)
    "sdk": define_collection(
        schema=DocPage,
        loader=github_loader(repo="myorg/sdk", path="docs/"),
    ),
}

Custom Loaders

ImplementContentSourcefor any content origin:

from collections.abc import AsyncIterator
from bengal.content.sources import ContentSource, ContentEntry

class MyCustomSource(ContentSource):
    source_type = "my-api"

    async def fetch_all(self) -> AsyncIterator[ContentEntry]:
        items = await self._get_items()
        for item in items:
            yield ContentEntry(
                id=item["id"],
                slug=item["slug"],
                content=item["body"],
                frontmatter={"title": item["title"]},
                source_type=self.source_type,
                source_name=self.name,
            )

    async def fetch_one(self, id: str) -> ContentEntry | None:
        item = await self._get_item(id)
        if not item:
            return None
        return ContentEntry(
            id=item["id"],
            slug=item["slug"],
            content=item["body"],
            frontmatter={"title": item["title"]},
            source_type=self.source_type,
            source_name=self.name,
        )

Zero-Cost Design

If you don't use remote sources:

  • No extra dependencies installed
  • No network calls
  • No import overhead
  • No configuration needed

Remote loaders are lazy-loaded only when you import them.

Other Content Sources

  • Autodoc — Generate API docs from Python, CLI commands, and OpenAPI specs

Seealso

2

Autodoc

Generate API docs from source code

Autodoc

Generate API documentation automatically from source code during site builds.

Do I Need This?

Note

Skip this if: You write all documentation manually.
Read this if: You want API docs from Python docstrings, CLI help from Click/Typer commands, or API specs from OpenAPI.

How It Works

Autodoc generates virtual pages during your site build. No intermediate markdown files are created. Configure sources in yourbengal.tomland documentation appears in your built site.

flowchart LR subgraph Sources A[Python Modules] B[CLI Commands] C[OpenAPI Specs] end D[Autodoc Engine] subgraph Output E[Virtual Pages] end A --> D B --> D C --> D D --> E

Configuration

Configure autodoc in yourbengal.toml:

# bengal.toml
[autodoc.python]
enabled = true
source_dirs = ["mypackage"]
include_private = false
include_special = false
docstring_style = "auto"  # auto, google, numpy, sphinx

Extracts:

  • Module and class docstrings
  • Function signatures and type hints
  • Examples from docstrings
# bengal.toml
[autodoc.cli]
enabled = true
app_module = "mypackage.cli:main"  # Click/Typer app entry point
framework = "click"  # click, argparse, or typer
include_hidden = false

Extracts:

  • Command descriptions
  • Argument documentation
  • Option flags and defaults
# bengal.toml
[autodoc.openapi]
enabled = true
spec_file = "api/openapi.yaml"

Extracts:

  • Endpoint documentation
  • Request/response schemas
  • Authentication requirements

Python Configuration Options

[autodoc.python]
enabled = true

# Source directories to scan
source_dirs = ["mypackage"]

# Patterns to exclude
exclude = [
    "*/tests/*",
    "*/__pycache__/*",
    "*/.venv/*",
]

# Docstring parsing style: auto, google, numpy, sphinx
docstring_style = "auto"

# Include private members (_prefixed)
include_private = false

# Include dunder methods (__init__, etc.)
include_special = false

# Include inherited members
include_inherited = false

# Prefix to strip from module paths
strip_prefix = "mypackage"

Building with Autodoc

Once configured, autodoc runs automatically during builds:

bengal build

The generated API documentation appears in your output directory alongside your regular content.

Performance Optimizations

Bengal automatically optimizes autodoc builds:

  • AST Caching: Parsed Python modules are cached between builds. Unchanged source files skip AST parsing entirely, providing 30-40% speedup for sites with many autodoc pages.
  • Selective Rebuilds: Only autodoc pages affected by changed source files are rebuilt during incremental builds.
  • Parallel Extraction: Python modules are extracted in parallel when multiple files are present.

These optimizations are automatic and require no configuration.

Navigation (topbar)

If you do not definemenu.main, Bengal generates a topbar menu automatically.

  • Manual menu overrides auto menu: Ifmenu.mainis present and non-empty, Bengal uses it and does not auto-discover topbar items.
  • Dev dropdown: In auto mode, Bengal may bundle autodoc outputs under a Dev dropdown when multiple “dev” links exist. If there is only one dev link (for example, API-only or CLI-only), it appears as a normal top-level menu entry.

If you want full control of where autodoc appears in the topbar, definemenu.main.

Strict Mode

Enable strict mode to fail builds on extraction or rendering errors:

[autodoc]
strict = true

Tip

Best practice: Enable strict mode in CI pipelines to catch documentation issues early.

Seealso

3

Custom Content Sources

Fetch content from APIs, databases, or remote services

Content sources let Bengal fetch content from anywhere—local files, GitHub repositories, REST APIs, Notion databases, or custom backends. You can create custom sources by implementing theContentSourceabstract class.

Built-in Sources

Bengal includes four content source types:

Source Type ID Use Case
LocalSource local, filesystem Local markdown files (default)
GitHubSource github GitHub repository content
RESTSource rest, api REST API endpoints
NotionSource notion Notion database pages

Using Built-in Sources

Local Source (Default)

The default source for local markdown files:

# collections.py
from bengal.collections import define_collection
from bengal.content.sources import local_loader

collections = {
    "docs": define_collection(
        schema=Doc,
        loader=local_loader("content/docs", exclude=["_drafts/*"]),
    ),
}

GitHub Source

Fetch content from a GitHub repository:

from bengal.content.sources import github_loader

collections = {
    "api-docs": define_collection(
        schema=APIDoc,
        loader=github_loader(
            repo="myorg/api-docs",
            branch="main",
            path="docs/",
            token=os.environ.get("GITHUB_TOKEN"),
        ),
    ),
}

Requires: pip install bengal[github]

REST Source

Fetch content from a REST API:

from bengal.content.sources import rest_loader

collections = {
    "posts": define_collection(
        schema=BlogPost,
        loader=rest_loader(
            url="https://api.example.com/posts",
            headers={"Authorization": "Bearer ${API_TOKEN}"},
            content_field="body",
            frontmatter_fields={"title": "title", "date": "published_at"},
        ),
    ),
}

Requires: pip install bengal[rest]

Notion Source

Fetch pages from a Notion database:

from bengal.content.sources import notion_loader

collections = {
    "wiki": define_collection(
        schema=WikiPage,
        loader=notion_loader(
            database_id="abc123...",
            token=os.environ.get("NOTION_TOKEN"),
        ),
    ),
}

Requires: pip install bengal[notion]

Creating a Custom Source

Implement theContentSourceabstract class:

from bengal.content.sources.source import ContentSource
from bengal.content.sources.entry import ContentEntry

class MyAPISource(ContentSource):
    """Fetch content from a custom API."""

    @property
    def source_type(self) -> str:
        return "my-api"

    async def fetch_all(self):
        """Fetch all content entries."""
        # Get items from your data source
        items = await self._fetch_items()

        for item in items:
            yield ContentEntry(
                id=item["id"],
                slug=item["slug"],
                content=item["body"],
                frontmatter={
                    "title": item["title"],
                    "date": item["created_at"],
                },
                source_type=self.source_type,
                source_name=self.name,
            )

    async def fetch_one(self, id: str):
        """Fetch a single entry by ID."""
        item = await self._fetch_item(id)
        if not item:
            return None

        return ContentEntry(
            id=item["id"],
            slug=item["slug"],
            content=item["body"],
            frontmatter={
                "title": item["title"],
                "date": item["created_at"],
            },
            source_type=self.source_type,
            source_name=self.name,
        )

    async def _fetch_items(self):
        """Your API call implementation."""
        import aiohttp
        async with aiohttp.ClientSession() as session:
            async with session.get(self.config["api_url"]) as resp:
                return await resp.json()

    async def _fetch_item(self, id: str):
        """Fetch single item."""
        import aiohttp
        async with aiohttp.ClientSession() as session:
            url = f"{self.config['api_url']}/{id}"
            async with session.get(url) as resp:
                if resp.status == 404:
                    return None
                return await resp.json()

ContentEntry Structure

Each source yieldsContentEntryobjects:

@dataclass
class ContentEntry:
    id: str                        # Unique identifier within source
    slug: str                      # URL-friendly slug for routing
    content: str                   # Raw markdown content
    frontmatter: dict[str, Any]    # Parsed metadata dictionary
    source_type: str               # Source type (e.g., "github", "notion")
    source_name: str               # Source instance name
    source_url: str | None         # Original URL for attribution
    last_modified: datetime | None # Last modification time
    checksum: str | None           # Content hash for caching

Registering Custom Sources

Option 1: Direct Registration

Register your source instance directly:

from bengal.content.sources import ContentLayerManager

manager = ContentLayerManager()
manager.register_custom_source("my-content", MyAPISource(
    name="my-content",
    config={"api_url": "https://api.example.com/content"},
))

Option 2: With Collections

Use your source as a collection loader:

# collections.py
from bengal.collections import define_collection

my_source = MyAPISource(
    name="my-content",
    config={"api_url": "https://api.example.com/content"},
)

collections = {
    "external": define_collection(
        schema=ExternalContent,
        loader=my_source,
    ),
}

Caching

Content sources support caching to avoid redundant fetches:

class MyAPISource(ContentSource):
    # ...

    def get_cache_key(self) -> str:
        """Generate cache key for this source configuration."""
        # Default implementation hashes config
        # Override for custom cache key logic
        return super().get_cache_key()

    async def is_changed(self, cached_checksum: str | None) -> bool:
        """Check if source content has changed."""
        # Return True to force refetch
        # Return False if content is unchanged
        current = await self._get_current_checksum()
        return current != cached_checksum

    async def get_last_modified(self):
        """Return last modification time for cache invalidation."""
        # Return datetime or None
        return None

Sync Wrappers

For convenience,ContentSourceprovides sync wrappers:

# Async (preferred for performance)
async for entry in source.fetch_all():
    process(entry)

# Sync (convenience wrapper)
for entry in source.fetch_all_sync():
    process(entry)

# Single entry
entry = source.fetch_one_sync("my-id")

Error Handling

Handle errors gracefully in your source:

async def fetch_all(self):
    try:
        items = await self._fetch_items()
    except aiohttp.ClientError as e:
        logger.error(f"Failed to fetch from {self.config['api_url']}: {e}")
        return  # Yield nothing on error

    for item in items:
        try:
            yield self._to_entry(item)
        except KeyError as e:
            logger.warning(f"Skipping malformed item {item.get('id')}: {e}")
            continue

Testing Custom Sources

import pytest
from unittest.mock import AsyncMock, patch

@pytest.mark.asyncio
async def test_my_api_source():
    source = MyAPISource(
        name="test",
        config={"api_url": "https://api.example.com"},
    )

    with patch.object(source, "_fetch_items", new_callable=AsyncMock) as mock:
        mock.return_value = [
            {"id": "1", "slug": "test", "title": "Test", "body": "Content", "created_at": "2025-01-01"},
        ]

        entries = [entry async for entry in source.fetch_all()]

        assert len(entries) == 1
        assert entries[0].frontmatter["title"] == "Test"
4

Content Collections

Validate frontmatter with typed schemas

Content Collections

Define typed schemas for your content to ensure consistency and catch errors early.

Do I Need This?

No. Collections are optional. Your site works fine without them.

Use collections when:

  • You want typos caught at build time, not in production
  • Multiple people edit content and need guardrails
  • You want consistent frontmatter across content types

Quick Setup

bengal collections init

This creates collections.pyat your project root. Edit it to uncomment what you need:

from bengal.collections import define_collection, BlogPost, DocPage

collections = {
    "blog": define_collection(schema=BlogPost, directory="blog"),
    "docs": define_collection(schema=DocPage, directory="docs"),
}

Done. Build as normal—validation happens automatically.

Built-in Schemas

Bengal provides schemas for common content types:

Schema Alias Required Fields Optional Fields
BlogPost Post title, date author, tags, draft, description, image, excerpt
DocPage Doc title weight, category, tags, toc, deprecated, description, since
APIReference API title, endpoint method, version, auth_required, rate_limit, deprecated, description
Tutorial title difficulty, duration, prerequisites, series, tags, order
Changelog title, date version, breaking, summary, draft

Import any of these:

from bengal.collections import BlogPost, DocPage, APIReference, Tutorial, Changelog
# Or use short aliases:
from bengal.collections import Post, Doc, API

Custom Schemas

Define your own using Python dataclasses:

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class ProjectPage:
    title: str
    status: str  # "active", "completed", "archived"
    started: datetime
    tech_stack: list[str] = field(default_factory=list)
    github_url: str | None = None

collections = {
    "projects": define_collection(
        schema=ProjectPage,
        directory="projects",
    ),
}

Validation Modes

By default, validation warns but doesn't fail builds:

content/blog/my-post.md
  └─ date: Required field 'date' is missing

Strict Mode

To fail builds on validation errors, add tobengal.toml:

[build]
strict_collections = true

Lenient Mode (Extra Fields)

To allow frontmatter fields not defined in your schema:

define_collection(
    schema=BlogPost,
    directory="blog",
    strict=False,       # Don't reject unknown fields
    allow_extra=True,   # Store extra fields in _extra dict
)

With strict=False, unknown fields are silently ignored. Add allow_extra=True to preserve them in a _extraattribute on the validated instance.

CLI Commands

# List defined collections and their schemas
bengal collections list

# Validate content without building
bengal collections validate

# Validate specific collection
bengal collections validate --collection blog

Advanced Options

Custom File Pattern

By default, collections match all markdown files (**/*.md). To match specific files:

define_collection(
    schema=BlogPost,
    directory="blog",
    glob="*.md",  # Only top-level, not subdirectories
)

Migration Tips

Existing site with inconsistent frontmatter?

  1. Start withstrict=Falseto allow extra fields
  2. Runbengal collections validateto find issues
  3. Fix content or adjust schema
  4. Enablestrict=Truewhen ready

Transform legacy field names:

def migrate_legacy(data: dict) -> dict:
    if "post_title" in data:
        data["title"] = data.pop("post_title")
    return data

collections = {
    "blog": define_collection(
        schema=BlogPost,
        directory="blog",
        transform=migrate_legacy,
    ),
}

Remote Content

Collections work with remote content too. Use a loader instead of a directory:

from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader

collections = {
    "api-docs": define_collection(
        schema=DocPage,
        loader=github_loader(repo="myorg/api-docs", path="docs/"),
    ),
}

See Content Sources for GitHub, Notion, REST API loaders.

Seealso

5

Validation

Content validation and health checks

Content Validation

Ensure content quality with health checks and automatic fixes.

Do I Need This?

Note

Skip this if: You manually check all links and content.
Read this if: You want automated quality assurance and CI/CD integration.

Validation Flow

flowchart LR A[Content] --> B[Validators] B --> C{Issues?} C -->|Yes| D[Report] C -->|No| E[Pass] D --> F{Auto-fixable?} F -->|Yes| G[Auto-fix] F -->|No| H[Manual fix needed]

Quick Start

# Run all checks
bengal validate

# Validate specific files
bengal validate --file content/page.md

# Only validate changed files (incremental)
bengal validate --changed

# Verbose output (show all checks)
bengal validate --verbose

# Show quality suggestions
bengal validate --suggestions

# Watch mode (validate on file changes)
bengal validate --watch
# Preview fixes
bengal fix --dry-run

# Apply safe fixes
bengal fix

# Apply all fixes including confirmations
bengal fix --all

# Fix specific validator only
bengal fix --validator Directives

Fixes common issues:

  • Unclosed directive fences
  • Invalid directive options
  • YAML syntax errors
# Fail build on issues
bengal build --strict

# Validate and exit with error code
bengal validate

The --strictflag makes warnings into errors.

Built-in Checks

Check What it validates
links Internal and external links work
assets Asset references exist
config Configuration is valid
navigation Menu structure is correct
rendering Templates render without errors
cross_ref Cross-references are valid
taxonomy Tags and categories are consistent
directives MyST directive syntax is correct
anchors Heading IDs are unique and valid

Custom Validators

Create project-specific rules by extendingBaseValidator:

# validators/custom.py
from bengal.health.base import BaseValidator
from bengal.health.report import CheckResult

class RequireAuthorValidator(BaseValidator):
    """Validator that checks for author field in frontmatter."""

    name = "Author Required"
    description = "Ensures all pages have an author field"

    def validate(self, site, build_context=None):
        results = []
        for page in site.pages:
            if not page.metadata.get("author"):
                results.append(CheckResult.error(
                    f"Missing author in {page.source_path}",
                    recommendation="Add 'author: Your Name' to frontmatter",
                    details=[str(page.source_path)],
                ))
        return results

Tip

CI integration: Addbengal validate to your CI pipeline to catch issues before deployment. Use --verbose to see all checks, or --suggestionsfor quality recommendations.

6

Deployment

Deploy your Bengal site to production

Deploy Your Site

Bengal generates static HTML, CSS, and JavaScript files. This means you can host your site anywhere that serves static files (e.g., GitHub Pages, Netlify, Vercel, AWS S3, Nginx).

The Production Build

When you are ready to ship, run the build command:

bengal build --environment production

This command:

  • Loads configuration fromconfig/environments/production.yaml(if it exists)
  • Minifies HTML output (enabled by default)
  • Generates thepublic/directory with your complete site

Common Build Flags

Flag Description Use Case
--environment production Loads production config overrides. Always use for shipping.
--strict Fails the build on template errors. Highly Recommended for CI/CD.
--clean-output Cleans thepublic/directory before building. Recommended to avoid stale files.
--fast Maximum performance (quiet output, full parallelism). Fast CI builds.
--verbose Shows detailed build output (phase timing, stats). Useful for debugging CI failures.

Example full command for CI:

bengal build --environment production --strict --clean-output

GitHub Pages

Deploy using GitHub Actions. Create.github/workflows/deploy.yml:

name: Deploy to GitHub Pages

on:
  push:
    branches: [main]

permissions:
  contents: read
  pages: write
  id-token: write

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.14'

      - name: Install Bengal
        run: pip install bengal

      - name: Build Site
        run: bengal build --environment production --strict --clean-output

      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: './public'

  deploy:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

Netlify

Create anetlify.tomlin your repository root:

[build]
  publish = "public"
  command = "bengal build --environment production"

[build.environment]
  PYTHON_VERSION = "3.14"

Vercel

Configure your project:

  1. Build Command:bengal build --environment production
  2. Output Directory:public
  3. Ensure yourrequirements.txt includes bengal.

Automatic Platform Detection

Bengal auto-detects your deployment platform and configuresbaseurlautomatically:

Platform Detection Baseurl Source
GitHub Pages GITHUB_ACTIONS=true Inferred fromGITHUB_REPOSITORY
Netlify NETLIFY=true URL or DEPLOY_PRIME_URL
Vercel VERCEL=true VERCEL_URL

You can override auto-detection with theBENGAL_BASEURLenvironment variable:

BENGAL_BASEURL="https://custom-domain.com" bengal build --environment production

Pre-Deployment Checklist

Before you merge to main or deploy:

  1. Runbengal config doctor: Checks for common configuration issues.
  2. Runbengal build --strictlocally: Ensures no template errors.
  3. Runbengal validate: Runs health checks on your site content.
  4. Checkconfig/environments/production.yaml: Ensure your baseurlis set to your production domain.
# config/environments/production.yaml
site:
  baseurl: "https://example.com"

Seealso

✓ Track Complete