# Content Collections

URL: /bengal/docs/build-sites/structure/collections/
Section: collections
Description: Validate frontmatter with typed schemas

---

> For a complete page index, fetch /bengal/llms.txt.

Define typed schemas for your content to ensure consistency and catch errors early.

## Do I Need This?

No. Collections are optional. Your site works fine without them.

Use collections when:

- You want typos caught at build time, not in production

- Multiple people edit content and need guardrails

- You want consistent frontmatter across content types

## Quick Setup

Create a`collections.py`file at your project root. Edit it to uncomment what you need:

python
PYTHON

```
from bengal.collections import define_collection, BlogPost, DocPage

collections = {
    "blog": define_collection(schema=BlogPost, directory="blog"),
    "docs": define_collection(schema=DocPage, directory="docs"),
}
```
Done. Build as normal—validation happens automatically.

## Built-in Schemas

Bengal provides schemas for common content types:

Schema
Alias
Required Fields
Optional Fields

`BlogPost`
`Post`
title, date
author, tags, draft, description, image, excerpt

`DocPage`
`Doc`
title
weight, category, tags, toc, deprecated, description, since

`APIReference`
`API`
title, endpoint
method, version, auth_required, rate_limit, deprecated, description

`Tutorial`
—
title
difficulty, duration, prerequisites, series, tags, order

`Changelog`
—
title, date
version, breaking, summary, draft

Import any of these:

python
PYTHON

```
from bengal.collections import BlogPost, DocPage, APIReference, Tutorial, Changelog
# Or use short aliases:
from bengal.collections import Post, Doc, API
```
## Custom Schemas

Define your own using Python dataclasses:

python
PYTHON

```
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class ProjectPage:
    title: str
    status: str  # "active", "completed", "archived"
    started: datetime
    tech_stack: list[str] = field(default_factory=list)
    github_url: str | None = None

collections = {
    "projects": define_collection(
        schema=ProjectPage,
        directory="projects",
    ),
}
```
## Validation Modes

By default, validation warns but doesn't fail builds:

tree-sitter-query
TREE-SITTER-QUERY

```
⚠ content/blog/my-post.md
  └─ date: Required field 'date' is missing
```
### Strict Mode

To fail builds on validation errors, add to`bengal.toml`:

toml
TOML

```
[build]
strict_collections = true
```
### Lenient Mode (Extra Fields)

To allow frontmatter fields not defined in your schema:

python
PYTHON

```
define_collection(
    schema=BlogPost,
    directory="blog",
    strict=False,       # Don't reject unknown fields
    allow_extra=True,   # Store extra fields in _extra dict
)
```
With `strict=False`, unknown fields are silently ignored. Add `allow_extra=True` to preserve them in a `_extra`attribute on the validated instance.

## CLI Commands

BASH

```
# List defined collections and their schemas
bengal content collections

# Validate content against schemas without building
bengal content schemas

# Validate specific collection
bengal content schemas --collection blog
```
## Advanced Options

### Custom File Pattern

By default, collections match all markdown files (`**/*.md`). To match specific files:

python
PYTHON

```
define_collection(
    schema=BlogPost,
    directory="blog",
    glob="*.md",  # Only top-level, not subdirectories
)
```
## Migration Tips

Existing site with inconsistent frontmatter?

- Start with`strict=False`to allow extra fields

- Run`bengal content schemas`to find issues

- Fix content or adjust schema

- Enable`strict=True`when ready

Transform legacy field names:

python
PYTHON

```
def migrate_legacy(data: dict) -> dict:
    if "post_title" in data:
        data["title"] = data.pop("post_title")
    return data

collections = {
    "blog": define_collection(
        schema=BlogPost,
        directory="blog",
        transform=migrate_legacy,
    ),
}
```
## Remote Content

Collections work with remote content too. Use a loader instead of a directory:

python
PYTHON

```
from bengal.collections import define_collection, DocPage
from bengal.content.sources import github_loader

collections = {
    "api-docs": define_collection(
        schema=DocPage,
        loader=github_loader(repo="myorg/api-docs", path="docs/"),
    ),
}
```
See Content Sources (/bengal/docs/build-sites/extend/sources/) for GitHub, Notion, REST API loaders.

Seealso

- Content Sources (/bengal/docs/build-sites/extend/sources/) — GitHub, Notion, REST API loaders

Collection Filters (/bengal/docs/reference/template-functions/collection-filters/)

Filter, sort, group, and transform collections of pages or items

Related

Health Check Codes Reference (/bengal/docs/reference/errors/health-codes/)

Complete reference for all Bengal health check codes with explanations and fixes

Related

Validate and Fix (/bengal/docs/ship/validate/validate-and-fix/)

Run health checks and automatically fix common content issues

Related

Archive Page (/bengal/docs/build-sites/customize/recipes/archive-page/)

Create year and month archive pages for blog content

Related

Bengal 0.1.5 (/bengal/releases/0.1.5/)

Named directive closers, media embeds, faster navigation, remote content sources, and comprehensive validation

6 months ago

collections (/bengal/tags/collections/)

schemas (/bengal/tags/schemas/)

validation (/bengal/tags/validation/)
