Content Collections

Validate frontmatter with typed schemas

1 page in this section

Define typed schemas for your content to ensure consistency and catch errors early.

Do I Need This?

No. Collections are optional. Your site works fine without them.

Use collections when:

  • You want typos caught at build time, not in production
  • Multiple people edit content and need guardrails
  • You want consistent frontmatter across content types

Quick Setup

bengal collections init

This createscollections.pyat your project root. Edit it to uncomment what you need:

1
2
3
4
5
6
from bengal.collections import define_collection, BlogPost, DocPage

collections = {
    "blog": define_collection(schema=BlogPost, directory="blog"),
    "docs": define_collection(schema=DocPage, directory="docs"),
}

Done. Build as normal—validation happens automatically.

Built-in Schemas

Bengal provides schemas for common content types:

Schema Required Fields Optional Fields
BlogPost title, date author, tags, draft, description, image
DocPage title weight, category, tags, toc, deprecated
APIReference title, endpoint method, version, auth_required, rate_limit
Tutorial title difficulty, duration, prerequisites, series
Changelog title, date version, breaking, summary

Import any of these:

from bengal.collections import BlogPost, DocPage, APIReference

Custom Schemas

Define your own using Python dataclasses:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional

@dataclass
class ProjectPage:
    title: str
    status: str  # "active", "completed", "archived"
    started: datetime
    tech_stack: list[str] = field(default_factory=list)
    github_url: Optional[str] = None

collections = {
    "projects": define_collection(
        schema=ProjectPage,
        directory="projects",
    ),
}

Validation Modes

By default, validation warns but doesn't fail builds:

⚠ content/blog/my-post.md
  └─ date: Required field 'date' is missing

Strict Mode

To fail builds on validation errors, add tobengal.toml:

1
2
[build]
strict_collections = true

Lenient Mode (Extra Fields)

To allow fields not in your schema:

1
2
3
4
5
define_collection(
    schema=BlogPost,
    directory="blog",
    strict=False,  # Allow extra frontmatter fields
)

CLI Commands

1
2
3
4
5
6
7
8
# List defined collections and their schemas
bengal collections list

# Validate content without building
bengal collections validate

# Validate specific collection
bengal collections validate --collection blog

Migration Tips

Existing site with inconsistent frontmatter?

  1. Start withstrict=Falseto allow extra fields
  2. Runbengal collections validateto find issues
  3. Fix content or adjust schema
  4. Switch tostrict=Truewhen ready

Transform legacy field names:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def migrate_legacy(data: dict) -> dict:
    if "post_title" in data:
        data["title"] = data.pop("post_title")
    return data

collections = {
    "blog": define_collection(
        schema=BlogPost,
        directory="blog",
        transform=migrate_legacy,
    ),
}

Remote Content

Collections work with remote content too. Use a loader instead of a directory:

1
2
3
4
5
6
7
8
9
from bengal.collections import define_collection, DocPage
from bengal.content_layer import github_loader

collections = {
    "api-docs": define_collection(
        schema=DocPage,
        loader=github_loader(repo="myorg/api-docs", path="docs/"),
    ),
}

See Content Sources for GitHub, Notion, REST API loaders.

Seealso

In This Section

Related Pages