content_discovery

Classes

ContentDiscovery

Discovers and organizes content files into pages and sections. Notes: - YAML errors in front matte…

Discovers and organizes content files into pages and sections.

Notes:

YAML errors in front matter are downgraded to debug; we fall back to using the content and synthesize minimal metadata to keep the build progressing.
UTF-8 BOM is stripped at read time bybengal.utils.file_io.read_text_fileto avoid confusing the YAML/front matter parser.
I18n dir-prefix strategy is supported (e.g.,content/en/...); hidden files/dirs are skipped except_index.md.
Parsing uses a thread pool for concurrency; unchanged pages can be represented as PageProxyin lazy modes.
Symlink loops are detected via inode tracking to prevent infinite recursion.
Content collections: When collections.py is present at project root, frontmatter is validated against schemas during discovery (fail fast).

Methods 1

discover

Discover all content in the content directory. Supports optional lazy loading …

2 tuple[list[Section]…

def discover(self, use_cache: bool = False, cache: Any | None = None) -> tuple[list[Section], list[Page]]

Discover all content in the content directory.

Supports optional lazy loading with PageProxy for incremental builds.

Parameters 2

`use_cache`	`bool`	Whether to use PageDiscoveryCache for lazy loading
`cache`	`Any \| None`	PageDiscoveryCache instance (if use_cache=True)

Returns

tuple[list[Section], list[Page]] —

Tuple of (sections, pages)

Internal Methods 13

__init__

Initialize content discovery.

2 None

def __init__(self, content_dir: Path, site: Any | None = None) -> None

Initialize content discovery.

Parameters 2

`content_dir`	`Path`	Root content directory
`site`	`Any \| None`	Optional Site reference for configuration access

_discover_full

Full discovery (current behavior) - discover all pages completely.

0 tuple[list[Section]…

def _discover_full(self) -> tuple[list[Section], list[Page]]

Full discovery (current behavior) - discover all pages completely.

Returns

tuple[list[Section], list[Page]] —

Tuple of (sections, pages)

_discover_with_cache

Discover content with lazy loading from cache. Uses PageProxy for unchanged pa…

1 tuple[list[Section]…

def _discover_with_cache(self, cache: Any) -> tuple[list[Section], list[Page]]

Discover content with lazy loading from cache.

Uses PageProxy for unchanged pages (metadata only) and parses changed pages.

Parameters 1

cache

Any

PageDiscoveryCache instance

Returns

tuple[list[Section], list[Page]] —

Tuple of (sections, pages) with mixed Page and PageProxy objects

_cache_is_valid

Check if cached metadata is still valid for a page.

2 bool

def _cache_is_valid(self, page: Page, cached_metadata: Any) -> bool

Check if cached metadata is still valid for a page.

Parameters 2

`page`	`Page`	Discovered page
`cached_metadata`	`Any`	Cached metadata from PageDiscoveryCache

Returns

bool —

True if cache is valid and can be used (unchanged page)

_walk_directory

Recursively walk a directory to discover content. Uses inode tracking to detec…

3 None

def _walk_directory(self, directory: Path, parent_section: Section, current_lang: str | None = None) -> None

Recursively walk a directory to discover content.

Uses inode tracking to detect and skip symlink loops.

Parameters 3

`directory`	`Path`	Directory to walk
`parent_section`	`Section`	Parent section to add content to
`current_lang`	`str \| None`

_is_content_file

Check if a file is a content file.

1 bool

def _is_content_file(self, file_path: Path) -> bool

Check if a file is a content file.

Parameters 1

file_path

Path

Path to check

Returns

bool —

True if it's a content file

_validate_against_collection

Validate frontmatter against collection schema if applicable.

2 dict[str, Any]

def _validate_against_collection(self, file_path: Path, metadata: dict[str, Any]) -> dict[str, Any]

Validate frontmatter against collection schema if applicable.

Parameters 2

`file_path`	`Path`	Path to content file
`metadata`	`dict[str, Any]`	Parsed frontmatter metadata

Returns

dict[str, Any] —

Validated metadata (possibly with schema-enforced defaults)

_get_collection_for_file

Find which collection a file belongs to based on its path.

1 tuple[str | None, C…

def _get_collection_for_file(self, file_path: Path) -> tuple[str | None, CollectionConfig[Any] | None]

Find which collection a file belongs to based on its path.

Parameters 1

file_path

Path

Path to content file

Returns

tuple[str | None, CollectionConfig[Any] | None] —

Tuple of (collection_name, CollectionConfig) or (None, None)

_create_page

Create a Page object from a file with robust error handling. Handles: - Valid …

3 Page

def _create_page(self, file_path: Path, current_lang: str | None = None, section: Section | None = None) -> Page

Create a Page object from a file with robust error handling.

Handles:

Valid frontmatter
Invalid YAML in frontmatter
Missing frontmatter
File encoding issues
IO errors
Collection schema validation (when collections defined)

Parameters 3

`file_path`	`Path`	Path to content file
`current_lang`	`str \| None`
`section`	`Section \| None`

Returns

Page —

Page object (always succeeds with fallback metadata)

_parse_content_file

Parse content file with robust error handling. Caches raw content in BuildCont…

1 tuple[str, dict[str…

def _parse_content_file(self, file_path: Path) -> tuple[str, dict[str, Any]]

Parse content file with robust error handling.

Caches raw content in BuildContext for later use by validators, eliminating redundant disk I/O during health checks.

Parameters 1

file_path

Path

Path to content file

Returns

tuple[str, dict[str, Any]] —

Tuple of (content, metadata)

_extract_content_skip_frontmatter

Extract content, skipping broken frontmatter section. Frontmatter is between -…

1 str

def _extract_content_skip_frontmatter(self, file_content: str) -> str

Extract content, skipping broken frontmatter section.

Frontmatter is between --- delimiters at start of file. If parsing failed, skip the section entirely.

Parameters 1

file_content

str

Full file content

Returns

str —

Content without frontmatter section

_sort_all_sections

Sort all sections and their children by weight. This recursively sorts: - Page…

0 None

def _sort_all_sections(self) -> None

Sort all sections and their children by weight.

This recursively sorts:

Pages within each section
Subsections within each section

Called after content discovery is complete.

_sort_section_recursive

Recursively sort a section and all its subsections.

1 None

def _sort_section_recursive(self, section: Section) -> None

Recursively sort a section and all its subsections.

Parameters 1

section

Section

Section to sort

discovery.content_discovery

Classes

Methods 1

Parameters 2

Returns

Parameters 2

Returns

Parameters 1

Returns

Parameters 2

Returns

Parameters 3

Parameters 1

Returns

Parameters 2

Returns

Parameters 1

Returns

Parameters 3

Returns

Parameters 1

Returns

Parameters 1

Returns

Parameters 1

`discovery.content_discovery`