Define typed schemas for your content to ensure consistency and catch errors early.
Do I Need This?
No. Collections are optional. Your site works fine without them.
Use collections when:
- You want typos caught at build time, not in production
- Multiple people edit content and need guardrails
- You want consistent frontmatter across content types
Quick Setup
bengal collections init
This creates collections.pyat your project root. Edit it to uncomment what you need:
from bengal.collections import define_collection, BlogPost, DocPage
collections = {
"blog": define_collection(schema=BlogPost, directory="blog"),
"docs": define_collection(schema=DocPage, directory="docs"),
}
Done. Build as normal—validation happens automatically.
Built-in Schemas
Bengal provides schemas for common content types:
| Schema | Alias | Required Fields | Optional Fields |
|---|---|---|---|
BlogPost |
Post |
title, date | author, tags, draft, description, image, excerpt |
DocPage |
Doc |
title | weight, category, tags, toc, deprecated, description, since |
APIReference |
API |
title, endpoint | method, version, auth_required, rate_limit, deprecated, description |
Tutorial |
— | title | difficulty, duration, prerequisites, series, tags, order |
Changelog |
— | title, date | version, breaking, summary, draft |
Import any of these:
from bengal.collections import BlogPost, DocPage, APIReference, Tutorial, Changelog
# Or use short aliases:
from bengal.collections import Post, Doc, API
Custom Schemas
Define your own using Python dataclasses:
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class ProjectPage:
title: str
status: str # "active", "completed", "archived"
started: datetime
tech_stack: list[str] = field(default_factory=list)
github_url: str | None = None
collections = {
"projects": define_collection(
schema=ProjectPage,
directory="projects",
),
}
Validation Modes
By default, validation warns but doesn't fail builds:
⚠ content/blog/my-post.md
└─ date: Required field 'date' is missing
Strict Mode
To fail builds on validation errors, add tobengal.toml:
[build]
strict_collections = true
Lenient Mode (Extra Fields)
To allow frontmatter fields not defined in your schema:
define_collection(
schema=BlogPost,
directory="blog",
strict=False, # Don't reject unknown fields
allow_extra=True, # Store extra fields in _extra dict
)
With strict=False, unknown fields are silently ignored. Add allow_extra=True to preserve them in a _extraattribute on the validated instance.
CLI Commands
# List defined collections and their schemas
bengal collections list
# Validate content without building
bengal collections validate
# Validate specific collection
bengal collections validate --collection blog
Advanced Options
Custom File Pattern
By default, collections match all markdown files (**/*.md). To match specific files:
define_collection(
schema=BlogPost,
directory="blog",
glob="*.md", # Only top-level, not subdirectories
)
Migration Tips
Existing site with inconsistent frontmatter?
- Start with
strict=Falseto allow extra fields - Run
bengal collections validateto find issues - Fix content or adjust schema
- Enable
strict=Truewhen ready
Transform legacy field names:
def migrate_legacy(data: dict) -> dict:
if "post_title" in data:
data["title"] = data.pop("post_title")
return data
collections = {
"blog": define_collection(
schema=BlogPost,
directory="blog",
transform=migrate_legacy,
),
}
Remote Content
Collections work with remote content too. Use a loader instead of a directory:
from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader
collections = {
"api-docs": define_collection(
schema=DocPage,
loader=github_loader(repo="myorg/api-docs", path="docs/"),
),
}
See Content Sources for GitHub, Notion, REST API loaders.
Seealso
- Content Sources — GitHub, Notion, REST API loaders