Content Collections

Validate frontmatter with typed schemas

3 min read 555 words
Edit this page

Was this page helpful?

Define typed schemas for your content to ensure consistency and catch errors early.

Do I Need This?

No. Collections are optional. Your site works fine without them.

Use collections when:

  • You want typos caught at build time, not in production
  • Multiple people edit content and need guardrails
  • You want consistent frontmatter across content types

Quick Setup

Create acollections.pyfile at your project root. Edit it to uncomment what you need:

PYTHON
from bengal.collections import define_collection, BlogPost, DocPage

collections = {
    "blog": define_collection(schema=BlogPost, directory="blog"),
    "docs": define_collection(schema=DocPage, directory="docs"),
}

Done. Build as normal—validation happens automatically.

Built-in Schemas

Bengal provides schemas for common content types:

Schema Alias Required Fields Optional Fields
BlogPost Post title, date author, tags, draft, description, image, excerpt
DocPage Doc title weight, category, tags, toc, deprecated, description, since
APIReference API title, endpoint method, version, auth_required, rate_limit, deprecated, description
Tutorial title difficulty, duration, prerequisites, series, tags, order
Changelog title, date version, breaking, summary, draft

Import any of these:

PYTHON
from bengal.collections import BlogPost, DocPage, APIReference, Tutorial, Changelog
# Or use short aliases:
from bengal.collections import Post, Doc, API

Custom Schemas

Define your own using Python dataclasses:

PYTHON
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class ProjectPage:
    title: str
    status: str  # "active", "completed", "archived"
    started: datetime
    tech_stack: list[str] = field(default_factory=list)
    github_url: str | None = None

collections = {
    "projects": define_collection(
        schema=ProjectPage,
        directory="projects",
    ),
}

Validation Modes

By default, validation warns but doesn't fail builds:

TREE-SITTER-QUERY
content/blog/my-post.md
  └─ date: Required field 'date' is missing

Strict Mode

To fail builds on validation errors, add tobengal.toml:

TOML
[build]
strict_collections = true

Lenient Mode (Extra Fields)

To allow frontmatter fields not defined in your schema:

PYTHON
define_collection(
    schema=BlogPost,
    directory="blog",
    strict=False,       # Don't reject unknown fields
    allow_extra=True,   # Store extra fields in _extra dict
)

With strict=False, unknown fields are silently ignored. Add allow_extra=True to preserve them in a _extraattribute on the validated instance.

CLI Commands

BASH
# List defined collections and their schemas
bengal content collections

# Validate content against schemas without building
bengal content schemas

# Validate specific collection
bengal content schemas --collection blog

Advanced Options

Custom File Pattern

By default, collections match all markdown files (**/*.md). To match specific files:

PYTHON
define_collection(
    schema=BlogPost,
    directory="blog",
    glob="*.md",  # Only top-level, not subdirectories
)

Migration Tips

Existing site with inconsistent frontmatter?

  1. Start withstrict=Falseto allow extra fields
  2. Runbengal content schemasto find issues
  3. Fix content or adjust schema
  4. Enablestrict=Truewhen ready

Transform legacy field names:

PYTHON
def migrate_legacy(data: dict) -> dict:
    if "post_title" in data:
        data["title"] = data.pop("post_title")
    return data

collections = {
    "blog": define_collection(
        schema=BlogPost,
        directory="blog",
        transform=migrate_legacy,
    ),
}

Remote Content

Collections work with remote content too. Use a loader instead of a directory:

PYTHON
from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader

collections = {
    "api-docs": define_collection(
        schema=DocPage,
        loader=github_loader(repo="myorg/api-docs", path="docs/"),
    ),
}

See Content Sources for GitHub, Notion, REST API loaders.

Seealso