Module

cache.build_cache.core

Core BuildCache class for tracking file changes and dependencies.

Main dataclass with fields, save/load, and coordination methods. Uses mixins for specialized functionality (file tracking, validation, taxonomy, caching).

Key Concepts:

  • File fingerprints: mtime + size for fast change detection, hash for verification
  • Dependency tracking: Templates, partials, and data files used by pages
  • Taxonomy indexes: Tag/category mappings for fast reconstruction
  • Config hash: Auto-invalidation when configuration changes
  • Version tolerance: Accepts missing/older cache versions gracefully
  • Zstandard compression: 92-93% size reduction, <1ms overhead

Related Modules:

  • bengal.orchestration.incremental: Incremental build logic using cache
  • bengal.cache.dependency_tracker: Dependency graph construction
  • bengal.cache.taxonomy_index: Taxonomy reconstruction from cache
  • bengal.cache.compression: Zstandard compression utilities

See Also:

  • plan/active/rfc-incremental-builds.md: Incremental build design
  • plan/active/rfc-orchestrator-performance-improvements.md: Performance RFC
  • plan/active/rfc-zstd-cache-compression.md: Compression RFC

Classes

BuildCache dataclass
Tracks file hashes and dependencies between builds. IMPORTANT PERSISTENCE CONTRACT: - This cache m…
14

Tracks file hashes and dependencies between builds.

IMPORTANT PERSISTENCE CONTRACT:

  • This cache must NEVER contain object references (Page, Section, Asset objects)
  • All data must be JSON-serializable (paths, strings, numbers, lists, dicts, sets)
  • Object relationships are rebuilt each build from cached paths

NOTE: BuildCache intentionally does NOT implement the Cacheable protocol.

Rationale:

  • Uses pickle for performance (faster than JSON for sets/complex structures)
  • Has tolerant loader with custom version handling logic
  • Contains many specialized fields (dependencies, hashes, etc.)
  • Designed for internal build state, not type-safe caching contracts

For type-safe caching, use types that implement the Cacheable protocol:

  • PageCore (bengal/core/page/page_core.py)
  • TagEntry (bengal/cache/taxonomy_index.py)
  • AssetDependencyEntry (bengal/cache/asset_dependency_map.py)
Inherits from FileTrackingMixin,ValidationCacheMixin,TaxonomyIndexMixin,ParsedContentCacheMixin,RenderedOutputCacheMixin,AutodocTrackingMixin

Attributes

Name Type Description
VERSION int
version int
file_hashes dict[str, str]

Mapping of file paths to their SHA256 hashes

file_fingerprints dict[str, dict[str, Any]]

Mapping of file paths to {mtime, size, hash} dicts

dependencies dict[str, set[str]]

Mapping of pages to their dependencies (templates, partials, etc.)

output_sources dict[str, str]

Mapping of output files to their source files

taxonomy_deps dict[str, set[str]]

Mapping of taxonomy terms to affected pages

page_tags dict[str, set[str]]

Mapping of page paths to their tags (for detecting tag changes)

tag_to_pages dict[str, set[str]]

Inverted index mapping tag slug to page paths (for O(1) reconstruction)

known_tags set[str]

Set of all tag slugs from previous build (for detecting deletions)

parsed_content dict[str, dict[str, Any]]

Cached parsed HTML/TOC (Optimization #2)

rendered_output dict[str, dict[str, Any]]

Cached rendered HTML (Optimization #3)

synthetic_pages dict[str, dict[str, Any]]

Cached synthetic page data (autodoc, etc.)

validation_results dict[str, dict[str, list[dict[str, Any]]]]

Cached validation results per file/validator

autodoc_dependencies dict[str, set[str]]
config_hash str | None

Hash of resolved configuration (for auto-invalidation)

last_build str | None

Timestamp of last successful build

Methods 9

load classmethod
Load build cache from disk with optional file locking. Loader behavior: - Tole…
2 BuildCache
def load(cls, cache_path: Path, use_lock: bool = True) -> BuildCache

Load build cache from disk with optional file locking.

Loader behavior:

  • Tolerant to malformed JSON: On parse errors or schema mismatches, returns a fresh BuildCacheinstance and logs a warning.
  • Version mismatches: Logs a warning and best-effort loads known fields.
  • File locking: Acquires shared lock to prevent reading during writes.
Parameters 2
cache_path Path

Path to cache file

use_lock bool

Whether to use file locking (default: True)

Returns

BuildCache

BuildCache instance (empty if file doesn't exist or is invalid)

save
Save build cache to disk with optional file locking. Persistence semantics: - …
2 None
def save(self, cache_path: Path, use_lock: bool = True) -> None

Save build cache to disk with optional file locking.

Persistence semantics:

  • Atomic writes: UsesAtomicFile(temp-write → atomic rename) to prevent partial files on crash/interruption.
  • File locking: Acquires exclusive lock to prevent concurrent writes.
  • Combined safety: Lock + atomic write ensures complete consistency.
Parameters 2
cache_path Path

Path to cache file

use_lock bool

Whether to use file locking (default: True)

clear
Clear all cache data.
0 None
def clear(self) -> None

Clear all cache data.

validate_config
Check if cache is valid for the current configuration. Compares the stored con…
1 bool
def validate_config(self, current_hash: str) -> bool

Check if cache is valid for the current configuration.

Compares the stored config_hash with the current configuration hash. If they differ, the cache is automatically cleared to ensure correctness.

This enables automatic cache invalidation when:

  • Configuration files change (bengal.toml, config/*.yaml)
  • Environment variables change (BENGAL_*)
  • Build profiles change (--profile writer)
Parameters 1
current_hash str

Hash of the current resolved configuration

Returns

bool

True if cache is valid (hashes match), False if cache was cleared

invalidate_file
Remove a file from all caches (useful when file is deleted). Extends FileTrack…
1 None
def invalidate_file(self, file_path: Path) -> None

Remove a file from all caches (useful when file is deleted).

Extends FileTrackingMixin.invalidate_file with additional cache cleanup.

Parameters 1
file_path Path

Path to file

get_stats
Get cache statistics with logging.
0 dict[str, int]
def get_stats(self) -> dict[str, int]

Get cache statistics with logging.

Returns

dict[str, int]

Dictionary with cache stats

get_page_cache
Get cached data for a synthetic page.
1 dict[str, Any] | None
def get_page_cache(self, cache_key: str) -> dict[str, Any] | None

Get cached data for a synthetic page.

Parameters 1
cache_key str

Unique cache key for the page

Returns

dict[str, Any] | None

Cached page data or None if not found

set_page_cache
Cache data for a synthetic page.
2 None
def set_page_cache(self, cache_key: str, page_data: dict[str, Any]) -> None

Cache data for a synthetic page.

Parameters 2
cache_key str

Unique cache key for the page

page_data dict[str, Any]

Page data to cache

invalidate_page_cache
Remove cached data for a synthetic page.
1 None
def invalidate_page_cache(self, cache_key: str) -> None

Remove cached data for a synthetic page.

Parameters 1
cache_key str

Cache key to invalidate

Internal Methods 5
__post_init__
Convert sets from lists after JSON deserialization.
0 None
def __post_init__(self) -> None

Convert sets from lists after JSON deserialization.

_load_from_file classmethod
Internal method to load cache from file (assumes lock is held if needed). Auto…
1 BuildCache
def _load_from_file(cls, cache_path: Path) -> BuildCache

Internal method to load cache from file (assumes lock is held if needed).

Auto-detects format: tries compressed (.json.zst) first, falls back to uncompressed (.json). This enables seamless migration.

Parameters 1
cache_path Path

Path to cache file (base path, without .zst extension)

Returns

BuildCache

BuildCache instance

_load_data_auto classmethod
Load raw data with auto-detection of format. Tries compressed format first (.j…
1 dict[str, Any] | None
def _load_data_auto(cls, cache_path: Path) -> dict[str, Any] | None

Load raw data with auto-detection of format.

Tries compressed format first (.json.zst), falls back to uncompressed (.json).

Parameters 1
cache_path Path

Base path to cache file

Returns

dict[str, Any] | None

Parsed data dict, or None if load failed

_save_to_file
Internal method to save cache to file (assumes lock is held if needed). Uses Z…
2 None
def _save_to_file(self, cache_path: Path, compress: bool = True) -> None

Internal method to save cache to file (assumes lock is held if needed).

Uses Zstandard compression by default for 92-93% size reduction.

Parameters 2
cache_path Path

Path to cache file (base path, will save as .json.zst)

compress bool

Whether to use compression (default: True)

__repr__
0 str
def __repr__(self) -> str
Returns

str