Module

core.page

Page representation for content pages in Bengal SSG.

This module provides the main Page class, which combines multiple mixins to provide a complete page interface while maintaining separation of concerns. Pages represent markdown content files and provide metadata, navigation, content processing, and template rendering capabilities.

Key Concepts:

  • Mixin architecture: Separated concerns via mixins (metadata, content, navigation)
  • Hashability: Pages hashable by source_path for set operations
  • AST-based content: Content represented as AST for efficient processing
  • Cacheable metadata: PageCore provides cacheable page metadata

Related Modules:

  • bengal.core.page.page_core: Cacheable page metadata
  • bengal.core.page.proxy: Lazy-loaded page placeholder
  • bengal.rendering.renderer: Page rendering logic
  • bengal.orchestration.content: Content discovery and page creation

See Also:

  • bengal/core/page/__init__.py: Page class for page representation
  • plan/active/rfc-content-ast-architecture.md: AST architecture RFC

Classes

Page dataclass
Represents a single content page. HASHABILITY: ============ Pages are hashable based on their sour…
14

Represents a single content page.

HASHABILITY:

============ Pages are hashable based on their source_path, allowing them to be stored in sets and used as dictionary keys. This enables:

  • Fast membership tests (O(1) instead of O(n))
  • Automatic deduplication with sets
  • Set operations for page analysis
  • Direct use as dictionary keys

Two pages with the same source_path are considered equal, even if their content differs. The hash is stable throughout the page lifecycle because source_path is immutable. Mutable fields (content, rendered_html, etc.) do not affect the hash or equality.

VIRTUAL PAGES:

============== Virtual pages represent dynamically-generated content (e.g., API docs) that doesn't have a corresponding file on disk. Virtual pages:

  • Have _virtual=True and a synthetic source_path
  • Are created via Page.create_virtual() factory
  • Don't read from disk (content provided directly)
  • Integrate with site's page collection and navigation

BUILD LIFECYCLE:

================ Pages progress through distinct build phases. Properties have different availability depending on the current phase:

  1. Discovery (content_discovery.py) ✅ Available: source_path, content, metadata, title, slug, date ❌ Not available: toc, parsed_ast, toc_items, rendered_html

  2. Parsing (pipeline.py) ✅ Available: All Stage 1 + toc, parsed_ast ✅ toc_items can be accessed (will extract from toc)

  3. Rendering (pipeline.py) ✅ Available: All previous + rendered_html, output_path ✅ All properties fully populated

Note: Some properties like toc_items can be accessed early (returning []) but won't cache empty results, allowing proper extraction after parsing.

Inherits from PageMetadataMixin,PageNavigationMixin,PageComputedMixin,PageRelationshipsMixin,PageOperationsMixin,PageContentMixin

Attributes

Name Type Description
_global_missing_section_warnings ClassVar[dict[str, int]]
_MAX_WARNING_KEYS ClassVar[int]
source_path Path

Path to the source content file (synthetic for virtual pages)

core PageCore | None
content str

Raw content (Markdown, etc.)

metadata dict[str, Any]

Frontmatter metadata (title, date, tags, etc.)

parsed_ast Any | None

Abstract Syntax Tree from parsed content

rendered_html str

Rendered HTML output

output_path Path | None

Path where the rendered page will be written

links list[str]

List of links found in the page

tags list[str]

Tags associated with the page

version str | None

Version information for versioned content

toc str | None

Table of contents HTML (auto-generated from headings)

related_posts list[Page]

Related pages (pre-computed during build based on tag overlap)

lang str | None
translation_key str | None
aliases list[str]
_site Any | None
_section_path Path | None
_section_url str | None
_toc_items_cache list[dict[str, Any]] | None
_ast_cache list[dict[str, Any]] | None
_html_cache str | None
_plain_text_cache str | None
_virtual bool

True if this is a virtual page (not backed by a disk file)

_prerendered_html str | None
_template_name str | None
toc_items

Structured TOC data for custom rendering

Methods 6

is_virtual property
Check if this is a virtual page (not backed by a disk file). Virtual pages are…
bool
def is_virtual(self) -> bool

Check if this is a virtual page (not backed by a disk file).

Virtual pages are used for:

  • API documentation generated from Python source code
  • Dynamically-generated content from external sources
  • Content that doesn't have a corresponding content/ file
Returns

bool

True if this page is virtual (not backed by a disk file)

template_name property
Get custom template name for this page. Virtual pages may specify a custom tem…
str | None
def template_name(self) -> str | None

Get custom template name for this page.

Virtual pages may specify a custom template for rendering. Returns None to use the default template selection logic.

Returns

str | None

prerendered_html property
Get pre-rendered HTML for virtual pages. Virtual pages with pre-rendered HTML …
str | None
def prerendered_html(self) -> str | None

Get pre-rendered HTML for virtual pages.

Virtual pages with pre-rendered HTML bypass markdown parsing and use this HTML directly in the template.

Returns

str | None

relative_path property
Get relative path string (alias for source_path as string). Used by templates …
str
def relative_path(self) -> str

Get relative path string (alias for source_path as string).

Used by templates and filtering where a string path is expected. This provides backward compatibility and convenience.

Returns

str

normalize_core_paths
Normalize PageCore paths to be relative (for cache consistency). This should b…
0 None
def normalize_core_paths(self) -> None

Normalize PageCore paths to be relative (for cache consistency).

This should be called before caching to ensure all paths are relative to the site root, preventing absolute path leakage into cache.

Note: Directly mutates self.core.source_path since dataclasses are mutable.

create_virtual classmethod
Create a virtual page for dynamically-generated content. Virtual pages are not…
8 Page
def create_virtual(cls, source_id: str, title: str, content: str = '', metadata: dict[str, Any] | None = None, rendered_html: str | None = None, template_name: str | None = None, output_path: Path | None = None, section_path: Path | None = None) -> Page

Create a virtual page for dynamically-generated content.

Virtual pages are not backed by a disk file but integrate with the site's page collection, navigation, and rendering pipeline.

Parameters 8
source_id str

Unique identifier for this page (used as source_path)

title str

Page title

content str

Raw content (markdown) - optional if rendered_html provided

metadata dict[str, Any] | None

Page metadata/frontmatter

rendered_html str | None

Pre-rendered HTML (bypasses markdown parsing)

template_name str | None

Custom template name (optional)

output_path Path | None

Explicit output path (optional)

section_path Path | None

Section this page belongs to (optional)

Returns

Page

A new virtual Page instance

Internal Methods 8
_section property
Get the section this page belongs to (lazy lookup via path or URL). This prope…
Any | None
def _section(self) -> Any | None

Get the section this page belongs to (lazy lookup via path or URL).

This property performs a path-based or URL-based lookup in the site's section registry, enabling stable section references across rebuilds when Section objects are recreated.

Virtual sections (path=None) use URL-based lookups via _section_url. Regular sections use path-based lookups via _section_path.

Returns

Any | None

Section object if found, None if page has no section or section not found

Implementation Note:

Uses counter-gated warnings to prevent log spam when sections are
missing (warns first 3 times, shows summary, then silent).

__post_init__
Initialize computed fields and PageCore.
0 None
def __post_init__(self) -> None

Initialize computed fields and PageCore.

_init_core_from_fields
Initialize PageCore from Page fields (backward compatibility helper). This all…
0 None
def _init_core_from_fields(self) -> None

Initialize PageCore from Page fields (backward compatibility helper).

This allows existing code that creates Page objects without passing core to continue working. Once all instantiation is updated, this can be removed and core made required.

Note: Initially creates PageCore with absolute paths, but normalize_core_paths() should be called before caching to convert to relative paths.

__hash__
Hash based on source_path for stable identity. The hash is computed from the p…
0 int
def __hash__(self) -> int

Hash based on source_path for stable identity.

The hash is computed from the page's source_path, which is immutable throughout the page lifecycle. This allows pages to be stored in sets and used as dictionary keys.

Returns

int

Integer hash of the source path

__eq__
Pages are equal if they have the same source path. Equality is based on source…
1 bool
def __eq__(self, other: Any) -> bool

Pages are equal if they have the same source path.

Equality is based on source_path only, not on content or other mutable fields. This means two Page objects representing the same source file are considered equal, even if their processed content differs.

Parameters 1
other Any

Object to compare with

Returns

bool

True if other is a Page with the same source_path

__repr__
0 str
def __repr__(self) -> str
Returns

str

_format_path_for_log
Format a path as relative to site root for logging. Makes paths relative to th…
1 str | None
def _format_path_for_log(self, path: Path | str | None) -> str | None

Format a path as relative to site root for logging.

Makes paths relative to the site root directory to avoid showing user-specific absolute paths in logs and warnings.

Parameters 1
path Path | str | None

Path to format (can be Path, str, or None)

Returns

str | None

Relative path string, or None if path was None

_section
Set the section this page belongs to (stores path or URL, not object). This se…
1 None
def _section(self, value: Any) -> None

Set the section this page belongs to (stores path or URL, not object).

This setter extracts the path (or URL for virtual sections) from the Section object and stores it, enabling stable references when Section objects are recreated during incremental rebuilds.

For virtual sections (path=None), stores relative_url in _section_url. For regular sections, stores path in _section_path.

Parameters 1
value Any

Section object or None