# Content Collections URL: /docs/extending/collections/ Section: extending -------------------------------------------------------------------------------- Content collections let you define typed schemas for your content's frontmatter. Bengal validates content against these schemas during discovery, catching errors early and providing IDE autocompletion. Quick Start Create a collections.py file in your project root: from dataclasses import dataclass, field from datetime import datetime from bengal.collections import define_collection @dataclass class BlogPost: title: str date: datetime author: str = "Anonymous" tags: list[str] = field(default_factory=list) draft: bool = False collections = { "blog": define_collection( schema=BlogPost, directory="content/blog", ), } Now any file in content/blog/ must have valid frontmatter: --- title: My First Post date: 2025-01-15 author: Jane Doe tags: [python, tutorial] --- Schema Definition Using Dataclasses Define your schema as a Python dataclass: from dataclasses import dataclass, field from datetime import datetime @dataclass class DocPage: title: str # Required weight: int = 0 # Optional with default description: str | None = None # Optional, nullable tags: list[str] = field(default_factory=list) # Mutable default Field Types Bengal automatically coerces frontmatter values to these types: Type YAML Example Notes str title: "Hello" Basic string int weight: 10 Integer float rating: 4.5 Float bool draft: true Boolean datetime date: 2025-01-15 ISO date string date published: 2025-01-15 Date only list[str] tags: [a, b, c] List of strings T | None author: null Optional/nullable Nested Schemas Schemas can contain nested dataclasses: @dataclass class Author: name: str email: str | None = None @dataclass class BlogPost: title: str author: Author # Nested schema Frontmatter: --- title: My Post author: name: Jane Doe email: jane@example.com --- Collection Configuration define_collection Options define_collection( schema=BlogPost, # Required: dataclass or Pydantic model directory="content/blog", # Directory containing content glob="**/*.md", # File matching pattern (default: all .md) strict=True, # Reject unknown fields (default: True) allow_extra=False, # Store extra fields in _extra (default: False) loader=None, # Custom content source ) Strict vs Lenient Mode Strict mode (default) rejects content with unknown frontmatter fields: # Strict: unknown fields cause validation errors collections = { "docs": define_collection(schema=DocPage, directory="content/docs", strict=True), } Lenient mode allows extra fields: # Lenient: extra fields are ignored or stored collections = { "docs": define_collection( schema=DocPage, directory="content/docs", strict=False, allow_extra=True, # Store extras in _extra dict ), } Built-in Schemas Bengal provides ready-to-use schemas for common content types: from bengal.collections.schemas import ( BlogPost, # title, date, author, tags, draft, description, image, excerpt DocPage, # title, weight, category, tags, toc, description, deprecated, since APIReference, # title, endpoint, method, version, auth_required, rate_limit Tutorial, # title, difficulty, duration, prerequisites, series, order Changelog, # title, date, version, breaking, draft, summary ) collections = { "blog": define_collection(schema=BlogPost, directory="content/blog"), "docs": define_collection(schema=DocPage, directory="content/docs"), } Extending Built-in Schemas Add custom fields by subclassing: from dataclasses import dataclass from bengal.collections.schemas import BlogPost @dataclass class MyBlogPost(BlogPost): """Extended blog post with custom fields.""" series: str | None = None reading_time: int | None = None featured: bool = False Validation Errors When content fails validation, Bengal reports detailed errors: Content validation failed: content/blog/my-post.md (collection: blog) └─ title: Required field 'title' is missing └─ date: Cannot parse 'January 15' as datetime └─ author.email: Invalid value for type 'str' Validation Result The validator returns a ValidationResult: from bengal.collections import SchemaValidator, ValidationResult validator = SchemaValidator(BlogPost) result: ValidationResult = validator.validate(frontmatter_dict) if result.valid: post: BlogPost = result.data # Typed instance else: for error in result.errors: print(f"{error.field}: {error.message}") Using with Remote Sources Collections work with remote content sources: from bengal.content_layer import github_loader, notion_loader collections = { # Local content "docs": define_collection( schema=DocPage, directory="content/docs", ), # GitHub repository "api-docs": define_collection( schema=APIReference, loader=github_loader(repo="myorg/api-docs", path="docs/"), ), # Notion database "wiki": define_collection( schema=WikiPage, loader=notion_loader(database_id="abc123"), ), } IDE Support With typed collections, your IDE provides: Autocompletion for frontmatter fields Type checking for field values Go to definition for schema classes Inline documentation from docstrings Pydantic Support Bengal also supports Pydantic models for advanced validation: from pydantic import BaseModel, EmailStr, HttpUrl class Author(BaseModel): name: str email: EmailStr website: HttpUrl | None = None class BlogPost(BaseModel): title: str author: Author class Config: extra = "forbid" # Strict mode Best Practices 1. Start with Built-in Schemas Use Bengal's built-in schemas and extend as needed: from dataclasses import dataclass from bengal.collections.schemas import DocPage @dataclass class MyDocPage(DocPage): custom_field: str | None = None 2. Use Strict Mode in Production Catch frontmatter errors early: define_collection(schema=DocPage, directory="content/docs", strict=True) 3. Document Your Schemas Add docstrings for IDE support: @dataclass class BlogPost: """ Blog post content schema. Attributes: title: Post title displayed in listings and page header date: Publication date (ISO format: YYYY-MM-DD) author: Author name for byline tags: List of topic tags for categorization """ title: str date: datetime author: str = "Anonymous" tags: list[str] = field(default_factory=list) Related Custom Content Sources for remote content Cheatsheet for frontmatter field quick reference -------------------------------------------------------------------------------- Metadata: - Author: lbliii - Word Count: 764 - Reading Time: 4 minutes