Content Collections

Define typed schemas for your content to ensure consistency and catch errors early.

Do I Need This?

No. Collections are optional. Your site works fine without them.

Use collections when:

You want typos caught at build time, not in production
Multiple people edit content and need guardrails
You want consistent frontmatter across content types

Quick Setup

Create acollections.pyfile at your project root. Edit it to uncomment what you need:

python

PYTHON

from bengal.collections import define_collection, BlogPost, DocPage

collections = {
    "blog": define_collection(schema=BlogPost, directory="blog"),
    "docs": define_collection(schema=DocPage, directory="docs"),
}

Done. Build as normal—validation happens automatically.

Built-in Schemas

Bengal provides schemas for common content types:

Schema	Alias	Required Fields	Optional Fields
`BlogPost`	`Post`	title, date	author, tags, draft, description, image, excerpt
`DocPage`	`Doc`	title	weight, category, tags, toc, deprecated, description, since
`APIReference`	`API`	title, endpoint	method, version, auth_required, rate_limit, deprecated, description
`Tutorial`	—	title	difficulty, duration, prerequisites, series, tags, order
`Changelog`	—	title, date	version, breaking, summary, draft

Import any of these:

python

PYTHON

from bengal.collections import BlogPost, DocPage, APIReference, Tutorial, Changelog
# Or use short aliases:
from bengal.collections import Post, Doc, API

Custom Schemas

Define your own using Python dataclasses:

python

PYTHON

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class ProjectPage:
    title: str
    status: str  # "active", "completed", "archived"
    started: datetime
    tech_stack: list[str] = field(default_factory=list)
    github_url: str | None = None

collections = {
    "projects": define_collection(
        schema=ProjectPage,
        directory="projects",
    ),
}

Validation Modes

By default, validation warns but doesn't fail builds:

tree-sitter-query

TREE-SITTER-QUERY

⚠ content/blog/my-post.md
  └─ date: Required field 'date' is missing

Strict Mode

To fail builds on validation errors, add tobengal.toml:

toml

TOML

[build]
strict_collections = true

Lenient Mode (Extra Fields)

To allow frontmatter fields not defined in your schema:

python

PYTHON

define_collection(
    schema=BlogPost,
    directory="blog",
    strict=False,       # Don't reject unknown fields
    allow_extra=True,   # Store extra fields in _extra dict
)

With strict=False, unknown fields are silently ignored. Add allow_extra=True to preserve them in a _extraattribute on the validated instance.

CLI Commands

BASH

# List defined collections and their schemas
bengal content collections

# Validate content against schemas without building
bengal content schemas

# Validate specific collection
bengal content schemas --collection blog

Advanced Options

Custom File Pattern

By default, collections match all markdown files (**/*.md). To match specific files:

python

PYTHON

define_collection(
    schema=BlogPost,
    directory="blog",
    glob="*.md",  # Only top-level, not subdirectories
)

Migration Tips

Existing site with inconsistent frontmatter?

Start withstrict=Falseto allow extra fields
Runbengal content schemasto find issues
Fix content or adjust schema
Enablestrict=Truewhen ready

Transform legacy field names:

python

PYTHON

def migrate_legacy(data: dict) -> dict:
    if "post_title" in data:
        data["title"] = data.pop("post_title")
    return data

collections = {
    "blog": define_collection(
        schema=BlogPost,
        directory="blog",
        transform=migrate_legacy,
    ),
}

Remote Content

Collections work with remote content too. Use a loader instead of a directory:

python

PYTHON

from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader

collections = {
    "api-docs": define_collection(
        schema=DocPage,
        loader=github_loader(repo="myorg/api-docs", path="docs/"),
    ),
}

See Content Sources for GitHub, Notion, REST API loaders.

Seealso

Content Sources — GitHub, Notion, REST API loaders