Classes
ContentDiscovery
Discovers and organizes content files into pages and sections.
Notes:
- YAML errors in front matte…
ContentDiscovery
Discovers and organizes content files into pages and sections.
Notes:
- YAML errors in front matter are downgraded to debug; we fall back to using the content and synthesize minimal metadata to keep the build progressing.
- UTF-8 BOM is stripped at read time by
bengal.utils.file_io.read_text_fileto avoid confusing the YAML/front matter parser. - I18n dir-prefix strategy is supported (e.g.,
content/en/...); hidden files/dirs are skipped except_index.md. - Parsing uses a thread pool for concurrency; unchanged pages can be represented as
PageProxyin lazy modes. - Symlink loops are detected via inode tracking to prevent infinite recursion.
- Content collections: When collections.py is present at project root, frontmatter is validated against schemas during discovery (fail fast).
Methods 1
discover
Discover all content in the content directory.
Supports optional lazy loading …
discover
def discover(self, use_cache: bool = False, cache: Any | None = None) -> tuple[list[Section], list[Page]]
Discover all content in the content directory.
Supports optional lazy loading with PageProxy for incremental builds.
Parameters 2
use_cache |
bool |
Whether to use PageDiscoveryCache for lazy loading |
cache |
Any | None |
PageDiscoveryCache instance (if use_cache=True) |
Returns
Tuple of (sections, pages)tuple[list[Section], list[Page]]
—
Internal Methods 13
__init__
Initialize content discovery.
__init__
def __init__(self, content_dir: Path, site: Any | None = None) -> None
Initialize content discovery.
Parameters 2
content_dir |
Path |
Root content directory |
site |
Any | None |
Optional Site reference for configuration access |
_discover_full
Full discovery (current behavior) - discover all pages completely.
_discover_full
def _discover_full(self) -> tuple[list[Section], list[Page]]
Full discovery (current behavior) - discover all pages completely.
Returns
Tuple of (sections, pages)tuple[list[Section], list[Page]]
—
_discover_with_cache
Discover content with lazy loading from cache.
Uses PageProxy for unchanged pa…
_discover_with_cache
def _discover_with_cache(self, cache: Any) -> tuple[list[Section], list[Page]]
Discover content with lazy loading from cache.
Uses PageProxy for unchanged pages (metadata only) and parses changed pages.
Parameters 1
cache |
Any |
PageDiscoveryCache instance |
Returns
Tuple of (sections, pages) with mixed Page and PageProxy objectstuple[list[Section], list[Page]]
—
_cache_is_valid
Check if cached metadata is still valid for a page.
_cache_is_valid
def _cache_is_valid(self, page: Page, cached_metadata: Any) -> bool
Check if cached metadata is still valid for a page.
Parameters 2
page |
Page |
Discovered page |
cached_metadata |
Any |
Cached metadata from PageDiscoveryCache |
Returns
True if cache is valid and can be used (unchanged page)bool
—
_walk_directory
Recursively walk a directory to discover content.
Uses inode tracking to detec…
_walk_directory
def _walk_directory(self, directory: Path, parent_section: Section, current_lang: str | None = None) -> None
Recursively walk a directory to discover content.
Uses inode tracking to detect and skip symlink loops.
Parameters 3
directory |
Path |
Directory to walk |
parent_section |
Section |
Parent section to add content to |
current_lang |
str | None |
_is_content_file
Check if a file is a content file.
_is_content_file
def _is_content_file(self, file_path: Path) -> bool
Check if a file is a content file.
Parameters 1
file_path |
Path |
Path to check |
Returns
True if it's a content filebool
—
_validate_against_collection
Validate frontmatter against collection schema if applicable.
_validate_against_collection
def _validate_against_collection(self, file_path: Path, metadata: dict[str, Any]) -> dict[str, Any]
Validate frontmatter against collection schema if applicable.
Parameters 2
file_path |
Path |
Path to content file |
metadata |
dict[str, Any] |
Parsed frontmatter metadata |
Returns
Validated metadata (possibly with schema-enforced defaults)dict[str, Any]
—
_get_collection_for_file
Find which collection a file belongs to based on its path.
_get_collection_for_file
def _get_collection_for_file(self, file_path: Path) -> tuple[str | None, CollectionConfig[Any] | None]
Find which collection a file belongs to based on its path.
Parameters 1
file_path |
Path |
Path to content file |
Returns
Tuple of (collection_name, CollectionConfig) or (None, None)tuple[str | None, CollectionConfig[Any] | None]
—
_create_page
Create a Page object from a file with robust error handling.
Handles:
- Valid …
_create_page
def _create_page(self, file_path: Path, current_lang: str | None = None, section: Section | None = None) -> Page
Create a Page object from a file with robust error handling.
Handles:
- Valid frontmatter
- Invalid YAML in frontmatter
- Missing frontmatter
- File encoding issues
- IO errors
- Collection schema validation (when collections defined)
Parameters 3
file_path |
Path |
Path to content file |
current_lang |
str | None |
|
section |
Section | None |
Returns
Page object (always succeeds with fallback metadata)Page
—
_parse_content_file
Parse content file with robust error handling.
Caches raw content in BuildCont…
_parse_content_file
def _parse_content_file(self, file_path: Path) -> tuple[str, dict[str, Any]]
Parse content file with robust error handling.
Caches raw content in BuildContext for later use by validators, eliminating redundant disk I/O during health checks.
Parameters 1
file_path |
Path |
Path to content file |
Returns
Tuple of (content, metadata)tuple[str, dict[str, Any]]
—
_extract_content_skip_frontmatter
Extract content, skipping broken frontmatter section.
Frontmatter is between -…
_extract_content_skip_frontmatter
def _extract_content_skip_frontmatter(self, file_content: str) -> str
Extract content, skipping broken frontmatter section.
Frontmatter is between --- delimiters at start of file. If parsing failed, skip the section entirely.
Parameters 1
file_content |
str |
Full file content |
Returns
Content without frontmatter sectionstr
—
_sort_all_sections
Sort all sections and their children by weight.
This recursively sorts:
- Page…
_sort_all_sections
def _sort_all_sections(self) -> None
Sort all sections and their children by weight.
This recursively sorts:
- Pages within each section
- Subsections within each section
Called after content discovery is complete.
_sort_section_recursive
Recursively sort a section and all its subsections.
_sort_section_recursive
def _sort_section_recursive(self, section: Section) -> None
Recursively sort a section and all its subsections.
Parameters 1
section |
Section |
Section to sort |