Module

cache.page_discovery_cache

Page discovery cache for incremental builds with lazy loading.

This module provides caching of page metadata (title, date, tags, section, slug) to enable skipping full content parsing for unchanged pages. Metadata is loaded from cache, with full content loaded lazily via PageProxy when accessed.

Key Types:

PageMetadata: Type alias for PageCore - the cacheable page metadata. Contains all fields needed for navigation, filtering, and display without loading full page content.

PageDiscoveryCacheEntry: Cache entry wrapper with validity tracking. Includes metadata, cache timestamp, and validity flag.

PageDiscoveryCache: Main cache class for storing/loading page metadata. Handles persistence, validation, and invalidation.

Architecture:

  • Metadata: source_path → PageMetadata (minimal navigation data)
  • Lazy Loading: Full content via PageProxy when needed
  • Storage: .bengal/page_metadata.json (JSON format)
  • Validation: File hash comparison to detect stale entries

Performance Impact:

  • Skip parsing: ~80ms saved per 100 unchanged pages
  • Memory efficient: Only metadata in memory until content accessed
  • Incremental: Only changed pages fully parsed

Caching Flow:

  1. Discovery phase checks cache for existing metadata
  2. If valid (hash matches), use cached PageMetadata
  3. If invalid/missing, parse file and cache new metadata
  4. Templates access metadata directly (fast)
  5. Content accessed lazily via PageProxy (when needed)

Related:

  • bengal.core.page.page_core: PageCore (= PageMetadata) definition
  • bengal.core.page.proxy: PageProxy for lazy loading
  • bengal.orchestration.incremental: Uses this cache for builds

Classes

PageDiscoveryCacheEntry 5
Cache entry with metadata and validity information.

Cache entry with metadata and validity information.

Attributes

Name Type Description
metadata PageMetadata
cached_at str
is_valid bool

Methods

to_cache_dict 0 dict[str, Any]
Serialize to cache-friendly dictionary (Cacheable protocol).
def to_cache_dict(self) -> dict[str, Any]
Returns
dict[str, Any]
from_cache_dict 1 PageDiscoveryCacheEntry
Deserialize from cache dictionary (Cacheable protocol).
classmethod
def from_cache_dict(cls, data: dict[str, Any]) -> PageDiscoveryCacheEntry
Parameters
Name Type Description
data
Returns
PageDiscoveryCacheEntry
PageDiscoveryCache 15
Persistent cache for page metadata enabling lazy page loading. Purpose: - Store page metadata (tit…

Persistent cache for page metadata enabling lazy page loading.

Purpose:

  • Store page metadata (title, date, tags, section, slug)
  • Enable incremental discovery (only load changed pages)
  • Support lazy loading of full page content on demand
  • Validate cache entries to detect stale data

Cache Format (JSON): { "version": 1, "pages": { "content/index.md": { "metadata": { "source_path": "content/index.md", "title": "Home", ... }, "cached_at": "2025-10-16T12:00:00", "is_valid": true } } }

Note: If cache format changes, load will fail and cache rebuilds automatically.

Methods

save_to_disk 0
Save cache to disk.
def save_to_disk(self) -> None
has_metadata 1 bool
Check if metadata is cached for a page.
def has_metadata(self, source_path: Path) -> bool
Parameters
Name Type Description
source_path

Path to source file

Returns
bool True if valid metadata exists in cache
get_metadata 1 PageMetadata | None
Get cached metadata for a page.
def get_metadata(self, source_path: Path) -> PageMetadata | None
Parameters
Name Type Description
source_path

Path to source file

Returns
PageMetadata | None PageMetadata if found and valid, None otherwise
add_metadata 1
Add or update metadata in cache.
def add_metadata(self, metadata: PageMetadata) -> None
Parameters
Name Type Description
metadata

PageMetadata to cache

invalidate 1
Mark a cache entry as invalid.
def invalidate(self, source_path: Path) -> None
Parameters
Name Type Description
source_path

Path to source file to invalidate

invalidate_all 0
Invalidate all cache entries.
def invalidate_all(self) -> None
clear 0
Clear all cache entries.
def clear(self) -> None
get_valid_entries 0 dict[str, PageMetadata]
Get all valid cached metadata entries.
def get_valid_entries(self) -> dict[str, PageMetadata]
Returns
dict[str, PageMetadata] Dictionary mapping source_path to PageMetadata for valid entries
get_invalid_entries 0 dict[str, PageMetadata]
Get all invalid cached metadata entries.
def get_invalid_entries(self) -> dict[str, PageMetadata]
Returns
dict[str, PageMetadata] Dictionary mapping source_path to PageMetadata for invalid entries
validate_entry 2 bool
Validate a cache entry against current file hash.
def validate_entry(self, source_path: Path, current_file_hash: str) -> bool
Parameters
Name Type Description
source_path

Path to source file

current_file_hash

Current hash of source file

Returns
bool True if cache entry is valid (hash matches), False otherwise
stats 0 dict[str, int]
Get cache statistics.
def stats(self) -> dict[str, int]
Returns
dict[str, int] Dictionary with cache stats (total, valid, invalid)
Internal Methods 4
__init__ 1
Initialize cache.
def __init__(self, cache_path: Path | None = None)
Parameters
Name Type Description
cache_path

Path to cache file (defaults to .bengal/page_metadata.json)

Default:None
_deserialize 1
Deserialize loaded data into cache state.
def _deserialize(self, data: dict[str, Any]) -> None
Parameters
Name Type Description
data
_serialize 0 dict[str, Any]
Serialize cache state for saving.
def _serialize(self) -> dict[str, Any]
Returns
dict[str, Any]
_on_version_mismatch 0
Clear state on version mismatch.
def _on_version_mismatch(self) -> None