Content Collections

Validate frontmatter with typed schemas

3 min read 557 words

Define typed schemas for your content to ensure consistency and catch errors early.

Do I Need This?

No. Collections are optional. Your site works fine without them.

Use collections when:

  • You want typos caught at build time, not in production
  • Multiple people edit content and need guardrails
  • You want consistent frontmatter across content types

Quick Setup

bengal collections init

This creates collections.pyat your project root. Edit it to uncomment what you need:

from bengal.collections import define_collection, BlogPost, DocPage

collections = {
    "blog": define_collection(schema=BlogPost, directory="blog"),
    "docs": define_collection(schema=DocPage, directory="docs"),
}

Done. Build as normal—validation happens automatically.

Built-in Schemas

Bengal provides schemas for common content types:

Schema Alias Required Fields Optional Fields
BlogPost Post title, date author, tags, draft, description, image, excerpt
DocPage Doc title weight, category, tags, toc, deprecated, description, since
APIReference API title, endpoint method, version, auth_required, rate_limit, deprecated, description
Tutorial title difficulty, duration, prerequisites, series, tags, order
Changelog title, date version, breaking, summary, draft

Import any of these:

from bengal.collections import BlogPost, DocPage, APIReference, Tutorial, Changelog
# Or use short aliases:
from bengal.collections import Post, Doc, API

Custom Schemas

Define your own using Python dataclasses:

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class ProjectPage:
    title: str
    status: str  # "active", "completed", "archived"
    started: datetime
    tech_stack: list[str] = field(default_factory=list)
    github_url: str | None = None

collections = {
    "projects": define_collection(
        schema=ProjectPage,
        directory="projects",
    ),
}

Validation Modes

By default, validation warns but doesn't fail builds:

content/blog/my-post.md
  └─ date: Required field 'date' is missing

Strict Mode

To fail builds on validation errors, add tobengal.toml:

[build]
strict_collections = true

Lenient Mode (Extra Fields)

To allow frontmatter fields not defined in your schema:

define_collection(
    schema=BlogPost,
    directory="blog",
    strict=False,       # Don't reject unknown fields
    allow_extra=True,   # Store extra fields in _extra dict
)

With strict=False, unknown fields are silently ignored. Add allow_extra=True to preserve them in a _extraattribute on the validated instance.

CLI Commands

# List defined collections and their schemas
bengal collections list

# Validate content without building
bengal collections validate

# Validate specific collection
bengal collections validate --collection blog

Advanced Options

Custom File Pattern

By default, collections match all markdown files (**/*.md). To match specific files:

define_collection(
    schema=BlogPost,
    directory="blog",
    glob="*.md",  # Only top-level, not subdirectories
)

Migration Tips

Existing site with inconsistent frontmatter?

  1. Start withstrict=Falseto allow extra fields
  2. Runbengal collections validateto find issues
  3. Fix content or adjust schema
  4. Enablestrict=Truewhen ready

Transform legacy field names:

def migrate_legacy(data: dict) -> dict:
    if "post_title" in data:
        data["title"] = data.pop("post_title")
    return data

collections = {
    "blog": define_collection(
        schema=BlogPost,
        directory="blog",
        transform=migrate_legacy,
    ),
}

Remote Content

Collections work with remote content too. Use a loader instead of a directory:

from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader

collections = {
    "api-docs": define_collection(
        schema=DocPage,
        loader=github_loader(repo="myorg/api-docs", path="docs/"),
    ),
}

See Content Sources for GitHub, Notion, REST API loaders.

Seealso