API Documentation Specialist
Auto-generate API documentation from source code and specs.
Stop writing API docs by hand. Generate documentation from Python docstrings, CLI help text, and OpenAPI specs—kept in sync automatically.
Tip
Duration: ~60 min | Prerequisite: Python codebase or API spec to document
Content Sources
Fetch content from external sources
Remote Content Sources
Fetch content from GitHub, Notion, REST APIs, and more.
Do I Need This?
No. By default, Bengal reads content from local files. That works for most sites.
Use remote sources when:
- Your docs live in multiple GitHub repos
- Content lives in a CMS (Notion, Contentful, etc.)
- You want to pull API docs from a separate service
- You need to aggregate content from different teams
Quick Start
Install the loader you need:
pip install bengal[github] # GitHub repositories
pip install bengal[notion] # Notion databases
pip install bengal[rest] # REST APIs
pip install bengal[all-sources] # Everything
Update your collections.py:
from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader
collections = {
# Local content (default)
"docs": define_collection(
schema=DocPage,
directory="content/docs",
),
# Remote content from GitHub
"api-docs": define_collection(
schema=DocPage,
loader=github_loader(
repo="myorg/api-docs",
path="docs/",
),
),
}
Build as normal. Remote content is fetched, cached, and validated like local content.
Available Loaders
GitHub
Fetch markdown from any GitHub repository:
from bengal.content.sources import github_loader
loader = github_loader(
repo="owner/repo", # Required: "owner/repo" format
branch="main", # Default: "main"
path="docs/", # Default: "" (root)
token=None, # Default: uses GITHUB_TOKEN env var
glob="*.md", # Default: "*.md" (file pattern to match)
)
For private repos, set GITHUB_TOKEN environment variable or pass tokendirectly.
Notion
Fetch pages from a Notion database:
from bengal.content.sources import notion_loader
loader = notion_loader(
database_id="abc123...", # Required: database ID from URL
token=None, # Default: uses NOTION_TOKEN env var
property_mapping={ # Map Notion properties to frontmatter
"title": "Name",
"date": "Published",
"tags": "Tags",
},
)
Setup:
- Create integration at notion.so/my-integrations
- Share your database with the integration
- Set
NOTION_TOKENenvironment variable
REST API
Fetch from any JSON API:
from bengal.content.sources import rest_loader
loader = rest_loader(
url="https://api.example.com/posts",
headers={"Authorization": "Bearer ${API_TOKEN}"}, # Env vars expanded
content_field="body", # JSON path to content
id_field="id", # JSON path to ID
frontmatter_fields={ # Map API fields to frontmatter
"title": "title",
"date": "published_at",
"tags": "categories",
},
)
Local (Explicit)
For consistency, you can also use an explicit local loader:
from bengal.content.sources import local_loader
loader = local_loader(
directory="content/docs",
glob="**/*.md",
exclude=["_drafts/*"],
)
Caching
Remote content is cached locally to avoid repeated API calls:
# Check cache status
bengal sources status
# Force refresh from remote
bengal sources fetch --force
# Clear all cached content
bengal sources clear
Cache behavior:
- Default TTL: 1 hour
- Cache directory:
.bengal/content_cache/ - Automatic invalidation when config changes
- Falls back to cache if remote unavailable
CLI Commands
# List configured content sources
bengal sources list
# Show cache status (age, size, validity)
bengal sources status
# Fetch/refresh from remote sources
bengal sources fetch
bengal sources fetch --source api-docs # Specific source
bengal sources fetch --force # Ignore cache
# Clear cached content
bengal sources clear
bengal sources clear --source api-docs
Environment Variables
| Variable | Used By | Description |
|---|---|---|
GITHUB_TOKEN |
GitHub loader | Personal access token for private repos |
NOTION_TOKEN |
Notion loader | Integration token |
| Custom | REST loader | Any${VAR}in headers is expanded |
Multi-Repo Documentation
A common pattern for large organizations:
from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader, local_loader
collections = {
# Main docs (local)
"docs": define_collection(
schema=DocPage,
directory="content/docs",
),
# API reference (from API team's repo)
"api": define_collection(
schema=DocPage,
loader=github_loader(repo="myorg/api-service", path="docs/"),
),
# SDK docs (from SDK repo)
"sdk": define_collection(
schema=DocPage,
loader=github_loader(repo="myorg/sdk", path="docs/"),
),
}
Custom Loaders
ImplementContentSourcefor any content origin:
from collections.abc import AsyncIterator
from bengal.content.sources import ContentSource, ContentEntry
class MyCustomSource(ContentSource):
source_type = "my-api"
async def fetch_all(self) -> AsyncIterator[ContentEntry]:
items = await self._get_items()
for item in items:
yield ContentEntry(
id=item["id"],
slug=item["slug"],
content=item["body"],
frontmatter={"title": item["title"]},
source_type=self.source_type,
source_name=self.name,
)
async def fetch_one(self, id: str) -> ContentEntry | None:
item = await self._get_item(id)
if not item:
return None
return ContentEntry(
id=item["id"],
slug=item["slug"],
content=item["body"],
frontmatter={"title": item["title"]},
source_type=self.source_type,
source_name=self.name,
)
Zero-Cost Design
If you don't use remote sources:
- No extra dependencies installed
- No network calls
- No import overhead
- No configuration needed
Remote loaders are lazy-loaded only when you import them.
Other Content Sources
- Autodoc — Generate API docs from Python, CLI commands, and OpenAPI specs
Seealso
- Content Collections — Schema validation for any source
Autodoc
Generate API docs from source code
Autodoc
Generate API documentation automatically from source code during site builds.
Do I Need This?
Note
Skip this if: You write all documentation manually.
Read this if: You want API docs from Python docstrings, CLI help from Click/Typer commands, or API specs from OpenAPI.
How It Works
Autodoc generates virtual pages during your site build. No intermediate markdown files are created. Configure sources in yourbengal.tomland documentation appears in your built site.
Configuration
Configure autodoc in yourbengal.toml:
# bengal.toml
[autodoc.python]
enabled = true
source_dirs = ["mypackage"]
include_private = false
include_special = false
docstring_style = "auto" # auto, google, numpy, sphinx
Extracts:
- Module and class docstrings
- Function signatures and type hints
- Examples from docstrings
# bengal.toml
[autodoc.cli]
enabled = true
app_module = "mypackage.cli:main" # Click/Typer app entry point
framework = "click" # click, argparse, or typer
include_hidden = false
Extracts:
- Command descriptions
- Argument documentation
- Option flags and defaults
# bengal.toml
[autodoc.openapi]
enabled = true
spec_file = "api/openapi.yaml"
Extracts:
- Endpoint documentation
- Request/response schemas
- Authentication requirements
Python Configuration Options
[autodoc.python]
enabled = true
# Source directories to scan
source_dirs = ["mypackage"]
# Patterns to exclude
exclude = [
"*/tests/*",
"*/__pycache__/*",
"*/.venv/*",
]
# Docstring parsing style: auto, google, numpy, sphinx
docstring_style = "auto"
# Include private members (_prefixed)
include_private = false
# Include dunder methods (__init__, etc.)
include_special = false
# Include inherited members
include_inherited = false
# Prefix to strip from module paths
strip_prefix = "mypackage"
Building with Autodoc
Once configured, autodoc runs automatically during builds:
bengal build
The generated API documentation appears in your output directory alongside your regular content.
Performance Optimizations
Bengal automatically optimizes autodoc builds:
- AST Caching: Parsed Python modules are cached between builds. Unchanged source files skip AST parsing entirely, providing 30-40% speedup for sites with many autodoc pages.
- Selective Rebuilds: Only autodoc pages affected by changed source files are rebuilt during incremental builds.
- Parallel Extraction: Python modules are extracted in parallel when multiple files are present.
These optimizations are automatic and require no configuration.
Navigation (topbar)
If you do not definemenu.main, Bengal generates a topbar menu automatically.
- Manual menu overrides auto menu: If
menu.mainis present and non-empty, Bengal uses it and does not auto-discover topbar items. - Dev dropdown: In auto mode, Bengal may bundle autodoc outputs under a Dev dropdown when multiple “dev” links exist. If there is only one dev link (for example, API-only or CLI-only), it appears as a normal top-level menu entry.
If you want full control of where autodoc appears in the topbar, definemenu.main.
Strict Mode
Enable strict mode to fail builds on extraction or rendering errors:
[autodoc]
strict = true
Tip
Best practice: Enable strict mode in CI pipelines to catch documentation issues early.
Seealso
- Architecture Reference — Technical details and API usage
Custom Content Sources
Fetch content from APIs, databases, or remote services
Content sources let Bengal fetch content from anywhere—local files, GitHub repositories, REST APIs, Notion databases, or custom backends. You can create custom sources by implementing theContentSourceabstract class.
Built-in Sources
Bengal includes four content source types:
| Source | Type ID | Use Case |
|---|---|---|
| LocalSource | local, filesystem |
Local markdown files (default) |
| GitHubSource | github |
GitHub repository content |
| RESTSource | rest, api |
REST API endpoints |
| NotionSource | notion |
Notion database pages |
Using Built-in Sources
Local Source (Default)
The default source for local markdown files:
# collections.py
from bengal.collections import define_collection
from bengal.content.sources import local_loader
collections = {
"docs": define_collection(
schema=Doc,
loader=local_loader("content/docs", exclude=["_drafts/*"]),
),
}
GitHub Source
Fetch content from a GitHub repository:
from bengal.content.sources import github_loader
collections = {
"api-docs": define_collection(
schema=APIDoc,
loader=github_loader(
repo="myorg/api-docs",
branch="main",
path="docs/",
token=os.environ.get("GITHUB_TOKEN"),
),
),
}
Requires: pip install bengal[github]
REST Source
Fetch content from a REST API:
from bengal.content.sources import rest_loader
collections = {
"posts": define_collection(
schema=BlogPost,
loader=rest_loader(
url="https://api.example.com/posts",
headers={"Authorization": "Bearer ${API_TOKEN}"},
content_field="body",
frontmatter_fields={"title": "title", "date": "published_at"},
),
),
}
Requires: pip install bengal[rest]
Notion Source
Fetch pages from a Notion database:
from bengal.content.sources import notion_loader
collections = {
"wiki": define_collection(
schema=WikiPage,
loader=notion_loader(
database_id="abc123...",
token=os.environ.get("NOTION_TOKEN"),
),
),
}
Requires: pip install bengal[notion]
Creating a Custom Source
Implement theContentSourceabstract class:
from bengal.content.sources.source import ContentSource
from bengal.content.sources.entry import ContentEntry
class MyAPISource(ContentSource):
"""Fetch content from a custom API."""
@property
def source_type(self) -> str:
return "my-api"
async def fetch_all(self):
"""Fetch all content entries."""
# Get items from your data source
items = await self._fetch_items()
for item in items:
yield ContentEntry(
id=item["id"],
slug=item["slug"],
content=item["body"],
frontmatter={
"title": item["title"],
"date": item["created_at"],
},
source_type=self.source_type,
source_name=self.name,
)
async def fetch_one(self, id: str):
"""Fetch a single entry by ID."""
item = await self._fetch_item(id)
if not item:
return None
return ContentEntry(
id=item["id"],
slug=item["slug"],
content=item["body"],
frontmatter={
"title": item["title"],
"date": item["created_at"],
},
source_type=self.source_type,
source_name=self.name,
)
async def _fetch_items(self):
"""Your API call implementation."""
import aiohttp
async with aiohttp.ClientSession() as session:
async with session.get(self.config["api_url"]) as resp:
return await resp.json()
async def _fetch_item(self, id: str):
"""Fetch single item."""
import aiohttp
async with aiohttp.ClientSession() as session:
url = f"{self.config['api_url']}/{id}"
async with session.get(url) as resp:
if resp.status == 404:
return None
return await resp.json()
ContentEntry Structure
Each source yieldsContentEntryobjects:
@dataclass
class ContentEntry:
id: str # Unique identifier within source
slug: str # URL-friendly slug for routing
content: str # Raw markdown content
frontmatter: dict[str, Any] # Parsed metadata dictionary
source_type: str # Source type (e.g., "github", "notion")
source_name: str # Source instance name
source_url: str | None # Original URL for attribution
last_modified: datetime | None # Last modification time
checksum: str | None # Content hash for caching
Registering Custom Sources
Option 1: Direct Registration
Register your source instance directly:
from bengal.content.sources import ContentLayerManager
manager = ContentLayerManager()
manager.register_custom_source("my-content", MyAPISource(
name="my-content",
config={"api_url": "https://api.example.com/content"},
))
Option 2: With Collections
Use your source as a collection loader:
# collections.py
from bengal.collections import define_collection
my_source = MyAPISource(
name="my-content",
config={"api_url": "https://api.example.com/content"},
)
collections = {
"external": define_collection(
schema=ExternalContent,
loader=my_source,
),
}
Caching
Content sources support caching to avoid redundant fetches:
class MyAPISource(ContentSource):
# ...
def get_cache_key(self) -> str:
"""Generate cache key for this source configuration."""
# Default implementation hashes config
# Override for custom cache key logic
return super().get_cache_key()
async def is_changed(self, cached_checksum: str | None) -> bool:
"""Check if source content has changed."""
# Return True to force refetch
# Return False if content is unchanged
current = await self._get_current_checksum()
return current != cached_checksum
async def get_last_modified(self):
"""Return last modification time for cache invalidation."""
# Return datetime or None
return None
Sync Wrappers
For convenience,ContentSourceprovides sync wrappers:
# Async (preferred for performance)
async for entry in source.fetch_all():
process(entry)
# Sync (convenience wrapper)
for entry in source.fetch_all_sync():
process(entry)
# Single entry
entry = source.fetch_one_sync("my-id")
Error Handling
Handle errors gracefully in your source:
async def fetch_all(self):
try:
items = await self._fetch_items()
except aiohttp.ClientError as e:
logger.error(f"Failed to fetch from {self.config['api_url']}: {e}")
return # Yield nothing on error
for item in items:
try:
yield self._to_entry(item)
except KeyError as e:
logger.warning(f"Skipping malformed item {item.get('id')}: {e}")
continue
Testing Custom Sources
import pytest
from unittest.mock import AsyncMock, patch
@pytest.mark.asyncio
async def test_my_api_source():
source = MyAPISource(
name="test",
config={"api_url": "https://api.example.com"},
)
with patch.object(source, "_fetch_items", new_callable=AsyncMock) as mock:
mock.return_value = [
{"id": "1", "slug": "test", "title": "Test", "body": "Content", "created_at": "2025-01-01"},
]
entries = [entry async for entry in source.fetch_all()]
assert len(entries) == 1
assert entries[0].frontmatter["title"] == "Test"
Related
- Content Collections for schema validation
- Build Pipeline for understanding discovery phase
Content Collections
Validate frontmatter with typed schemas
Content Collections
Define typed schemas for your content to ensure consistency and catch errors early.
Do I Need This?
No. Collections are optional. Your site works fine without them.
Use collections when:
- You want typos caught at build time, not in production
- Multiple people edit content and need guardrails
- You want consistent frontmatter across content types
Quick Setup
bengal collections init
This creates collections.pyat your project root. Edit it to uncomment what you need:
from bengal.collections import define_collection, BlogPost, DocPage
collections = {
"blog": define_collection(schema=BlogPost, directory="blog"),
"docs": define_collection(schema=DocPage, directory="docs"),
}
Done. Build as normal—validation happens automatically.
Built-in Schemas
Bengal provides schemas for common content types:
| Schema | Alias | Required Fields | Optional Fields |
|---|---|---|---|
BlogPost |
Post |
title, date | author, tags, draft, description, image, excerpt |
DocPage |
Doc |
title | weight, category, tags, toc, deprecated, description, since |
APIReference |
API |
title, endpoint | method, version, auth_required, rate_limit, deprecated, description |
Tutorial |
— | title | difficulty, duration, prerequisites, series, tags, order |
Changelog |
— | title, date | version, breaking, summary, draft |
Import any of these:
from bengal.collections import BlogPost, DocPage, APIReference, Tutorial, Changelog
# Or use short aliases:
from bengal.collections import Post, Doc, API
Custom Schemas
Define your own using Python dataclasses:
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class ProjectPage:
title: str
status: str # "active", "completed", "archived"
started: datetime
tech_stack: list[str] = field(default_factory=list)
github_url: str | None = None
collections = {
"projects": define_collection(
schema=ProjectPage,
directory="projects",
),
}
Validation Modes
By default, validation warns but doesn't fail builds:
⚠ content/blog/my-post.md
└─ date: Required field 'date' is missing
Strict Mode
To fail builds on validation errors, add tobengal.toml:
[build]
strict_collections = true
Lenient Mode (Extra Fields)
To allow frontmatter fields not defined in your schema:
define_collection(
schema=BlogPost,
directory="blog",
strict=False, # Don't reject unknown fields
allow_extra=True, # Store extra fields in _extra dict
)
With strict=False, unknown fields are silently ignored. Add allow_extra=True to preserve them in a _extraattribute on the validated instance.
CLI Commands
# List defined collections and their schemas
bengal collections list
# Validate content without building
bengal collections validate
# Validate specific collection
bengal collections validate --collection blog
Advanced Options
Custom File Pattern
By default, collections match all markdown files (**/*.md). To match specific files:
define_collection(
schema=BlogPost,
directory="blog",
glob="*.md", # Only top-level, not subdirectories
)
Migration Tips
Existing site with inconsistent frontmatter?
- Start with
strict=Falseto allow extra fields - Run
bengal collections validateto find issues - Fix content or adjust schema
- Enable
strict=Truewhen ready
Transform legacy field names:
def migrate_legacy(data: dict) -> dict:
if "post_title" in data:
data["title"] = data.pop("post_title")
return data
collections = {
"blog": define_collection(
schema=BlogPost,
directory="blog",
transform=migrate_legacy,
),
}
Remote Content
Collections work with remote content too. Use a loader instead of a directory:
from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader
collections = {
"api-docs": define_collection(
schema=DocPage,
loader=github_loader(repo="myorg/api-docs", path="docs/"),
),
}
See Content Sources for GitHub, Notion, REST API loaders.
Seealso
- Content Sources — GitHub, Notion, REST API loaders
Validation
Content validation and health checks
Content Validation
Ensure content quality with health checks and automatic fixes.
Do I Need This?
Note
Skip this if: You manually check all links and content.
Read this if: You want automated quality assurance and CI/CD integration.
Validation Flow
Quick Start
# Run all checks
bengal validate
# Validate specific files
bengal validate --file content/page.md
# Only validate changed files (incremental)
bengal validate --changed
# Verbose output (show all checks)
bengal validate --verbose
# Show quality suggestions
bengal validate --suggestions
# Watch mode (validate on file changes)
bengal validate --watch
# Preview fixes
bengal fix --dry-run
# Apply safe fixes
bengal fix
# Apply all fixes including confirmations
bengal fix --all
# Fix specific validator only
bengal fix --validator Directives
Fixes common issues:
- Unclosed directive fences
- Invalid directive options
- YAML syntax errors
# Fail build on issues
bengal build --strict
# Validate and exit with error code
bengal validate
The --strictflag makes warnings into errors.
Built-in Checks
| Check | What it validates |
|---|---|
links |
Internal and external links work |
assets |
Asset references exist |
config |
Configuration is valid |
navigation |
Menu structure is correct |
rendering |
Templates render without errors |
cross_ref |
Cross-references are valid |
taxonomy |
Tags and categories are consistent |
directives |
MyST directive syntax is correct |
anchors |
Heading IDs are unique and valid |
Custom Validators
Create project-specific rules by extendingBaseValidator:
# validators/custom.py
from bengal.health.base import BaseValidator
from bengal.health.report import CheckResult
class RequireAuthorValidator(BaseValidator):
"""Validator that checks for author field in frontmatter."""
name = "Author Required"
description = "Ensures all pages have an author field"
def validate(self, site, build_context=None):
results = []
for page in site.pages:
if not page.metadata.get("author"):
results.append(CheckResult.error(
f"Missing author in {page.source_path}",
recommendation="Add 'author: Your Name' to frontmatter",
details=[str(page.source_path)],
))
return results
Tip
CI integration: Addbengal validate to your CI pipeline to catch issues before deployment. Use --verbose to see all checks, or --suggestionsfor quality recommendations.
Deployment
Deploy your Bengal site to production
Deploy Your Site
Bengal generates static HTML, CSS, and JavaScript files. This means you can host your site anywhere that serves static files (e.g., GitHub Pages, Netlify, Vercel, AWS S3, Nginx).
The Production Build
When you are ready to ship, run the build command:
bengal build --environment production
This command:
- Loads configuration from
config/environments/production.yaml(if it exists) - Minifies HTML output (enabled by default)
- Generates the
public/directory with your complete site
Common Build Flags
| Flag | Description | Use Case |
|---|---|---|
--environment production |
Loads production config overrides. | Always use for shipping. |
--strict |
Fails the build on template errors. | Highly Recommended for CI/CD. |
--clean-output |
Cleans thepublic/directory before building. |
Recommended to avoid stale files. |
--fast |
Maximum performance (quiet output, full parallelism). | Fast CI builds. |
--verbose |
Shows detailed build output (phase timing, stats). | Useful for debugging CI failures. |
Example full command for CI:
bengal build --environment production --strict --clean-output
GitHub Pages
Deploy using GitHub Actions. Create.github/workflows/deploy.yml:
name: Deploy to GitHub Pages
on:
push:
branches: [main]
permissions:
contents: read
pages: write
id-token: write
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.14'
- name: Install Bengal
run: pip install bengal
- name: Build Site
run: bengal build --environment production --strict --clean-output
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
path: './public'
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
Netlify
Create anetlify.tomlin your repository root:
[build]
publish = "public"
command = "bengal build --environment production"
[build.environment]
PYTHON_VERSION = "3.14"
Vercel
Configure your project:
- Build Command:
bengal build --environment production - Output Directory:
public - Ensure your
requirements.txtincludesbengal.
Automatic Platform Detection
Bengal auto-detects your deployment platform and configuresbaseurlautomatically:
| Platform | Detection | Baseurl Source |
|---|---|---|
| GitHub Pages | GITHUB_ACTIONS=true |
Inferred fromGITHUB_REPOSITORY |
| Netlify | NETLIFY=true |
URL or DEPLOY_PRIME_URL |
| Vercel | VERCEL=true |
VERCEL_URL |
You can override auto-detection with theBENGAL_BASEURLenvironment variable:
BENGAL_BASEURL="https://custom-domain.com" bengal build --environment production
Pre-Deployment Checklist
Before you merge to main or deploy:
- Run
bengal config doctor: Checks for common configuration issues. - Run
bengal build --strictlocally: Ensures no template errors. - Run
bengal validate: Runs health checks on your site content. - Check
config/environments/production.yaml: Ensure yourbaseurlis set to your production domain.
# config/environments/production.yaml
site:
baseurl: "https://example.com"
Seealso
- Configuration — Environment-specific settings
- Performance — Optimize build times