Fetch content from GitHub, Notion, REST APIs, and more.
Do I Need This?
No. By default, Bengal reads content from local files. That works for most sites.
Use remote sources when:
- Your docs live in multiple GitHub repos
- Content lives in a CMS (Notion, Contentful, etc.)
- You want to pull API docs from a separate service
- You need to aggregate content from different teams
Quick Start
Install the loader you need:
pip install bengal[github] # GitHub repositories
pip install bengal[notion] # Notion databases
pip install bengal[rest] # REST APIs
pip install bengal[all-sources] # Everything
Update your collections.py:
from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader
collections = {
# Local content (default)
"docs": define_collection(
schema=DocPage,
directory="content/docs",
),
# Remote content from GitHub
"api-docs": define_collection(
schema=DocPage,
loader=github_loader(
repo="myorg/api-docs",
path="docs/",
),
),
}
Build as normal. Remote content is fetched, cached, and validated like local content.
Available Loaders
GitHub
Fetch markdown from any GitHub repository:
from bengal.content.sources import github_loader
loader = github_loader(
repo="owner/repo", # Required: "owner/repo" format
branch="main", # Default: "main"
path="docs/", # Default: "" (root)
token=None, # Default: uses GITHUB_TOKEN env var
glob="*.md", # Default: "*.md" (file pattern to match)
)
For private repos, set GITHUB_TOKEN environment variable or pass tokendirectly.
Notion
Fetch pages from a Notion database:
from bengal.content.sources import notion_loader
loader = notion_loader(
database_id="abc123...", # Required: database ID from URL
token=None, # Default: uses NOTION_TOKEN env var
property_mapping={ # Map Notion properties to frontmatter
"title": "Name",
"date": "Published",
"tags": "Tags",
},
)
Setup:
- Create integration at notion.so/my-integrations
- Share your database with the integration
- Set
NOTION_TOKENenvironment variable
REST API
Fetch from any JSON API:
from bengal.content.sources import rest_loader
loader = rest_loader(
url="https://api.example.com/posts",
headers={"Authorization": "Bearer ${API_TOKEN}"}, # Env vars expanded
content_field="body", # JSON path to content
id_field="id", # JSON path to ID
frontmatter_fields={ # Map API fields to frontmatter
"title": "title",
"date": "published_at",
"tags": "categories",
},
)
Local (Explicit)
For consistency, you can also use an explicit local loader:
from bengal.content.sources import local_loader
loader = local_loader(
directory="content/docs",
glob="**/*.md",
exclude=["_drafts/*"],
)
Caching
Remote content is cached locally to avoid repeated API calls:
# Check cache status
bengal sources status
# Force refresh from remote
bengal sources fetch --force
# Clear all cached content
bengal sources clear
Cache behavior:
- Default TTL: 1 hour
- Cache directory:
.bengal/content_cache/ - Automatic invalidation when config changes
- Falls back to cache if remote unavailable
CLI Commands
# List configured content sources
bengal sources list
# Show cache status (age, size, validity)
bengal sources status
# Fetch/refresh from remote sources
bengal sources fetch
bengal sources fetch --source api-docs # Specific source
bengal sources fetch --force # Ignore cache
# Clear cached content
bengal sources clear
bengal sources clear --source api-docs
Environment Variables
| Variable | Used By | Description |
|---|---|---|
GITHUB_TOKEN |
GitHub loader | Personal access token for private repos |
NOTION_TOKEN |
Notion loader | Integration token |
| Custom | REST loader | Any${VAR}in headers is expanded |
Multi-Repo Documentation
A common pattern for large organizations:
from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader, local_loader
collections = {
# Main docs (local)
"docs": define_collection(
schema=DocPage,
directory="content/docs",
),
# API reference (from API team's repo)
"api": define_collection(
schema=DocPage,
loader=github_loader(repo="myorg/api-service", path="docs/"),
),
# SDK docs (from SDK repo)
"sdk": define_collection(
schema=DocPage,
loader=github_loader(repo="myorg/sdk", path="docs/"),
),
}
Custom Loaders
ImplementContentSourcefor any content origin:
from collections.abc import AsyncIterator
from bengal.content.sources import ContentSource, ContentEntry
class MyCustomSource(ContentSource):
source_type = "my-api"
async def fetch_all(self) -> AsyncIterator[ContentEntry]:
items = await self._get_items()
for item in items:
yield ContentEntry(
id=item["id"],
slug=item["slug"],
content=item["body"],
frontmatter={"title": item["title"]},
source_type=self.source_type,
source_name=self.name,
)
async def fetch_one(self, id: str) -> ContentEntry | None:
item = await self._get_item(id)
if not item:
return None
return ContentEntry(
id=item["id"],
slug=item["slug"],
content=item["body"],
frontmatter={"title": item["title"]},
source_type=self.source_type,
source_name=self.name,
)
Zero-Cost Design
If you don't use remote sources:
- No extra dependencies installed
- No network calls
- No import overhead
- No configuration needed
Remote loaders are lazy-loaded only when you import them.
Other Content Sources
- Autodoc — Generate API docs from Python, CLI commands, and OpenAPI specs
Seealso
- Content Collections — Schema validation for any source