Module

collections

Content Collections - Type-safe content schemas for Bengal.

Provides content collections with schema validation, enabling type-safe frontmatter and early error detection during content discovery. Collections are opt-in and backward compatible with existing Bengal sites.

Quick Start:

Create acollections.pyfile in your project root:

>>> from dataclasses import dataclass, field
>>> from datetime import datetime
>>> from bengal.collections import define_collection
>>>
>>> @dataclass
... class BlogPost:
...     title: str
...     date: datetime
...     author: str = "Anonymous"
...     tags: list[str] = field(default_factory=list)
...
>>> collections = {
...     "blog": define_collection(schema=BlogPost, directory="content/blog"),
... }

Key Features:

  • Type-safe frontmatter: Validate content against dataclass or Pydantic schemas
  • Early error detection: Catch schema violations during discovery, not rendering
  • IDE support: Get autocompletion for frontmatter fields
  • Flexible validation: Strict mode rejects unknown fields; lenient mode allows them
  • Remote content: Fetch content from GitHub, Notion, or custom sources

Public API:

  • define_collection(): Create a collection configuration
  • CollectionConfig: Collection configuration dataclass
  • SchemaValidator: Validate data against schemas
  • ValidationResult: Result of schema validation
  • ContentValidationError: Raised when content fails validation
  • ValidationError: Single field validation error

Standard Schemas:

Ready-to-use schemas for common content types:

  • BlogPost: Blog posts with title, date, author, tags
  • DocPage: Documentation pages with weight, category, toc
  • APIReference: API endpoint documentation
  • Tutorial: Tutorial/guide pages with difficulty, duration
  • Changelog: Release changelog entries

Architecture:

Collections integrate with Bengal's discovery phase. When content is discovered, frontmatter is validated against the collection's schema. Invalid content raisesContentValidationErrorwith details.

Validation supports:

  • Python dataclasses (recommended)
  • Pydantic models (auto-detected)
  • Type coercion for datetime, date, lists
  • Nested dataclass validation

Related Modules:

  • bengal.content.discovery.content_discovery: Collection integration point
  • bengal.content.sources: Remote content sources (GitHub, Notion)
  • bengal.core.page.metadata: Page frontmatter access

Classes

CollectionConfig 9
Configuration for a content collection. Defines how content in a directory (or remote source) maps…

Configuration for a content collection.

Defines how content in a directory (or remote source) maps to a typed schema. Created viadefine_collection() rather than direct instantiation.

Type Parameters: T: The schema type (dataclass or Pydantic model)

Attributes

Name Type Description
schema type[T]

Dataclass or Pydantic model class defining the frontmatter structure. Required fields in the schema become required frontmatter.

directory Path | None

Directory containing collection content, relative to content root. Required for local content; optional when usingloader.

glob str

Glob pattern for matching content files within the directory. Defaults to**/*.md(all markdown files recursively).

strict bool

IfTrue (default), reject content with unknown frontmatter fields. Set to Falseto allow extra fields.

allow_extra bool

IfTrue, store unrecognized fields in a _extra dict attribute on the validated instance. Only applies when strict=False.

loader ContentSource | None

Optional :class:ContentSource for fetching remote content. When provided, content is fetched from the remote source instead of the local filesystem. Requires extras: pip install bengal[github]

Methods

is_remote 0 bool
Whether this collection fetches content from a remote source.
property
def is_remote(self) -> bool
Returns
bool ``True`` if a loader is configured; ``False`` for local content.
source_type 0 str
The content source type identifier.
property
def source_type(self) -> str
Returns
str Source type string: ``'local'`` for filesystem content, or the loader's ``source_type`` (e.g., ``'github'``, ``'notion'``).
Internal Methods 1
__post_init__ 0
Validate configuration and normalize the directory path.
def __post_init__(self) -> None

Functions

define_collection 6 CollectionConfig[T]
Define a content collection with a typed schema. Collections provide type-safe…
def define_collection(schema: type[T], directory: str | Path | None = None, *, glob: str = '**/*.md', strict: bool = True, allow_extra: bool = False, loader: ContentSource | None = None) -> CollectionConfig[T]

Define a content collection with a typed schema.

Collections provide type-safe frontmatter validation during content discovery. Errors are caught early, and IDEs provide autocompletion for frontmatter fields.

Parameters
Name Type Description
schema type[T]

Dataclass or Pydantic model class defining the frontmatter structure. Fields without defaults are required; fields with defaults (orOptional[T]type hints) are optional.

directory str | Path | None

Directory containing collection content, relative to the content root. Required for local content; omit when usingloader.

Default:None
glob str

Glob pattern for matching content files. Defaults to**/*.md(all markdown files recursively). Only used for local content.

Default:'**/*.md'
strict bool

IfTrue(default), reject content with unknown frontmatter fields not defined in the schema.

Default:True
allow_extra bool

IfTrue, store unrecognized fields in a _extra dict on the validated instance. Only effective when strict=False.

Default:False
loader ContentSource | None

Optional :class:ContentSource for fetching remote content. Requires extras: pip install bengal[github] or bengal[notion].

Default:None
Returns
CollectionConfig[T]