Module

collections

Content Collections - Type-safe content schemas for Bengal.

This module provides content collections with schema validation, enabling type-safe frontmatter and early error detection during content discovery.

Usage (Local Content):

# collections.py (project root)
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
from bengal.collections import define_collection

@dataclass
class BlogPost:
    title: str
    date: datetime
    author: str = "Anonymous"
    tags: list[str] = field(default_factory=list)
    draft: bool = False

collections = {
    "blog": define_collection(
        schema=BlogPost,
        directory="content/blog",
    ),
}

Usage (Remote Content - Content Layer):

from bengal.collections import define_collection
from bengal.content_layer import github_loader, notion_loader

collections = {
    # Local content (default)
    "docs": define_collection(schema=Doc, directory="content/docs"),

    # Remote content from GitHub
    "api": define_collection(
        schema=APIDoc,
        loader=github_loader(repo="myorg/api-docs", path="docs/"),
    ),

    # Remote content from Notion
    "blog": define_collection(
        schema=BlogPost,
        loader=notion_loader(database_id="abc123"),
    ),
}

Architecture:

  • Collections are opt-in (backward compatible)
  • Schemas use Python dataclasses or Pydantic models
  • Validation happens during discovery phase (fail fast)
  • Supports both strict and lenient modes
  • Remote sources via Content Layer (zero-cost if unused)

Related:

  • bengal/discovery/content_discovery.py: Integration point
  • bengal/content_layer/: Remote content sources
  • bengal/core/page/metadata.py: Frontmatter access

Classes

CollectionConfig dataclass
Configuration for a content collection.
3

Configuration for a content collection.

Inherits from Generic[T]

Attributes

Name Type Description
schema type[T]

Dataclass or Pydantic model defining frontmatter structure

directory Path | None

Directory containing collection content (relative to content root). Required for local content, optional when using a remote loader.

glob str

Glob pattern for matching files (local content only)

strict bool

If True, reject unknown frontmatter fields

allow_extra bool

If True, store extra fields in _extra attribute

transform Callable[[dict[str, Any]], dict[str, Any]] | None

Optional function to transform frontmatter before validation

loader ContentSource | None

Optional ContentSource for remote content (Content Layer). When provided, content is fetched from the remote source instead of the local directory. Install extras: pip install bengal[github]

Methods 2

is_remote property
Check if this collection uses a remote loader.
bool
def is_remote(self) -> bool

Check if this collection uses a remote loader.

Returns

bool

source_type property
Get the source type for this collection.
str
def source_type(self) -> str

Get the source type for this collection.

Returns

str

Internal Methods 1
__post_init__
Validate configuration and normalize directory.
0 None
def __post_init__(self) -> None

Validate configuration and normalize directory.

Functions

define_collection
Define a content collection with typed schema. Collections provide type-safe frontmatter validatio…
2 CollectionConfig[T]
def define_collection(schema: type[T], directory: str | Path | None = None) -> CollectionConfig[T]

Define a content collection with typed schema.

Collections provide type-safe frontmatter validation during content discovery, catching errors early and enabling IDE autocompletion.

Supports both local content (via directory) and remote content (via loader). Remote loaders are part of the Content Layer - install extras as needed: pip install bengal[github] # GitHub loader pip install bengal[notion] # Notion loader

Parameters 2

Name Type Default Description
schema type[T]

Dataclass or Pydantic model defining frontmatter structure. Required fields must not have defaults. Optional fields should have defaults or use Optional[T] type hints.

directory str | Path | None None

Directory containing collection content (relative to content root). Required for local content, optional when using a remote loader.

Returns

CollectionConfig[T]

CollectionConfig instance for use in collections dict.

Example (Local Content): >>> from dataclasses import dataclass, field >>> from datetime import datetime >>> from typing import Optional >>> >>> @dataclass ... class BlogPost: ... title: str ... date: datetime ... author: str = "Anonymous" ... tags: list[str] = field(default_factory=list) ... draft: bool = False ... >>> blog = define_collection( ... schema=BlogPost, ... directory="content/blog", ... )

Example (Remote Content - GitHub): >>> from bengal.content_layer import github_loader >>> >>> api_docs = define_collection( ... schema=APIDoc, ... loader=github_loader(repo="myorg/api-docs", path="docs/"), ... )

Example (Remote Content - Notion): >>> from bengal.content_layer import notion_loader >>> >>> blog = define_collection( ... schema=BlogPost, ... loader=notion_loader(database_id="abc123..."), ... )

Example with transform:

>>> def normalize_legacy(data: dict) -> dict:
...     if 'post_title' in data:
...         data['title'] = data.pop('post_title')
...     return data
...
>>> blog = define_collection(
...     schema=BlogPost,
...     directory="content/blog",
...     transform=normalize_legacy,
... )