# Build Cache URL: /bengal/docs/0.5.0/reference/architecture/core/cache/ Section: core Tags: core, caching, incremental-builds, performance, dependency-tracking, cache, compression, compression.zstd (Python 3.14+ stdlib) -------------------------------------------------------------------------------- Cache System Bengal implements an intelligent caching system that enables sub-second incremental rebuilds. How It Works The build cache (.bengal/cache.json.zst) tracks the state of your project to determine exactly what needs to be rebuilt. Cache files are compressed with Zstandard for 92-93% size reduction. flowchart TD Start[Start Build] --> Load[Load Cache] Load --> Coordinator[CacheCoordinator] Coordinator --> Detect[Detect Changes] Detect --> Config{Config Changed?} Config -->|Yes| Full[Full Rebuild] Config -->|No| Hash[Check File Hashes] Hash --> DepGraph[Query Dependency Tracking] DepGraph --> Invalidate[Invalidate Affected Caches] Invalidate --> Filter[Filter Work] Filter --> Render[Render Affected Pages] Render --> Update[Update Cache] Update --> Save[Save to Disk] Caching Strategies File Hashing Dependency Tracking Inverted Index Change Detection We use SHA256 hashing to detect file changes. Content files (.md) Templates (.html, .jinja2) Config files (.toml) Assets (.css, .js) Impact Analysis We track relationships to know what to rebuild. Page → Template: If post.html changes, rebuild all blog posts. Tag → Pages: If python tag changes, rebuild tags/python/ page. Page → Partial: If header.html changes, rebuild everything. Taxonomy Lookup We store an inverted index of tags to avoid parsing all pages. Stored: tag_to_pages['python'] = ['post1.md', 'post2.md'] Benefit: O(1) lookup for taxonomy page generation. Cache Invalidation Architecture Bengal uses a unified CacheCoordinator to manage page-level cache invalidation across all cache layers. This ensures consistent invalidation cascades and provides observability into why pages are rebuilt. Components Component Module Purpose CacheCoordinator bengal/orchestration/build/coordinator.py Central service for coordinated cache invalidation PathRegistry bengal/cache/path_registry.py Canonical path representation for consistent cache keys RebuildManifest bengal/cache/manifest.py Tracks rebuilds with reasons and timing Page Invalidation Reasons When a page's caches are invalidated, the coordinator records why: class PageInvalidationReason(Enum): CONTENT_CHANGED = auto() # Page content modified DATA_FILE_CHANGED = auto() # Data file dependency changed TEMPLATE_CHANGED = auto() # Template dependency changed TAXONOMY_CASCADE = auto() # Tag/category member changed ASSET_CHANGED = auto() # Asset dependency changed CONFIG_CHANGED = auto() # Site config changed MANUAL = auto() # Explicit invalidation FULL_BUILD = auto() # Full rebuild requested OUTPUT_MISSING = auto() # Output file doesn't exist Cache Layers The coordinator invalidates multiple cache layers per page: Fingerprint (file_fingerprints) — File hash for change detection Parsed Content (parsed_content) — Markdown parsing results Rendered Output (rendered_output) — Final HTML output # Example: Data file change triggers cascade events = coordinator.invalidate_for_data_file(Path("data/team.yaml")) # Returns list of InvalidationEvent for each affected page Invalidation Events Each invalidation is recorded as an InvalidationEvent: @dataclass class InvalidationEvent: page_path: Path # Which page was invalidated reason: PageInvalidationReason # Why it was invalidated trigger: str # What triggered it (e.g., "data/team.yaml") caches_cleared: list[str] # Which cache layers were cleared Integration Points The coordinator integrates with existing change detectors: Detector Uses DataChangeDetector coordinator.invalidate_for_data_file() TaxonomyCascadeDetector coordinator.invalidate_taxonomy_cascade() phase_update_pages_list coordinator.invalidate_page() PathRegistry The PathRegistry ensures consistent path representation across cache layers: registry = PathRegistry(site) # Get canonical source path (relative to content dir) canonical = registry.canonical_source(page) # e.g., "blog/post.md" # Get cache key for consistent lookups key = registry.cache_key(page) # Used as dict key in all caches # Check if a path is for generated content is_gen = registry.is_generated(path) # True for taxonomy pages, autodoc, etc. This prevents cache key mismatches between different parts of the build (e.g., absolute vs relative paths, content vs generated pages). Provenance-Based Builds Incremental decisions are backed by provenance tracking in bengal/build/provenance/. Instead of relying on ad-hoc flags, the provenance store tracks content-addressed inputs and outputs so the detection pipeline can quickly prove what is fresh and what must be rebuilt. Stale-Output Guards Warm builds should be conservative when freshness is uncertain. Bengal uses several guards before trusting cached output: Nanosecond fingerprints: file fingerprints include mtime_ns, size, and SHA256 content hash. Older cache entries without nanosecond data fall back to the legacy mtime+size comparison, then hash verification. Build-start provenance time: provenance mtime short-circuit data records the build start timestamp, not the later cache-save timestamp. If a file changes while a build is running, the next warm build verifies content instead of incorrectly trusting the previous record. Missing output repair: page outputs and site-wide postprocess artifacts are checked before an incremental skip is accepted. Atomic cache persistence: cache and provenance indexes use atomic writes so interrupted builds do not leave partially-written JSON behind. If Bengal cannot resolve an input path, read a fingerprint, or prove an output exists, it treats the page or artifact as stale and rebuilds it. Zstandard Compression Bengal uses Zstandard (zstd) compression for all cache files, leveraging Python 3.14's new compression.zstd module (PEP 784). Performance Benefits Metric Before After Improvement Cache size (773 pages) 1.64 MB 99 KB 94% smaller Compression ratio 1x 12-14x 12-14x Cache load time ~5ms ~0.5ms 10x faster Cache save time ~3ms ~1ms 3x faster How It Works flowchart LR Data[Cache Data] --> JSON[JSON Serialize] JSON --> Zstd[Zstd Compress] Zstd --> File[.json.zst File] File2[.json.zst File] --> Decomp[Zstd Decompress] Decomp --> Parse[JSON Parse] Parse --> Data2[Cache Data] File Format Cache files are stored as JSON and transparently compressed/decompressed: .bengal/ ├── cache.json.zst # Main build cache (compressed) ├── page_metadata.json.zst # Page discovery cache (compressed) ├── asset_deps.json.zst # Asset dependency map (compressed) ├── taxonomy_index.json.zst # Tag/category index (compressed) ├── build_history.json # Build history for delta analysis ├── server.pid # Dev server PID ├── asset-manifest.json # Asset manifest ├── indexes/ # Query indexes (section, author, etc.) ├── templates/ # Jinja bytecode cache ├── content_cache/ # Remote content cache ├── logs/ # Build/serve logs ├── metrics/ # Performance metrics ├── profiles/ # Profiling output ├── themes/ # Theme state (swizzle registry) │ └── sources.json ├── js_bundle/ # JS bundle temporary files ├── pipeline_out/ # Asset pipeline temporary output └── generated/ # Generated content (auto-pages, etc.) All cache files (cache.json.zst, page_metadata.json.zst, asset_deps.json.zst, taxonomy_index.json.zst) are compressed with Zstandard for optimal performance. This provides 92-93% size reduction and 10x faster I/O compared to uncompressed JSON. Backward Compatibility Bengal automatically handles migration: Read: Tries .json.zst first, falls back to .json Write: Always writes compressed .json.zst Migration: Old uncompressed caches are read and re-saved as compressed This means existing projects upgrade seamlessly—no manual migration needed. CI/CD Benefits Compressed caches significantly improve CI/CD workflows: # GitHub Actions - cache is 16x smaller to transfer - uses: actions/cache@v4 with: path: .bengal/ key: bengal-${{ hashFiles('content/**') }} Faster cache upload/download (100KB vs 1.6MB) Lower storage costs Faster build times in CI pipelines The "No Object References" Rule Architecture Principle Never persist object references across builds. The cache only stores: File paths (strings) Hashes (strings) Simple metadata (dicts/lists) This ensures cache stability. When a build starts, we load the cache and reconstruct the relationships with fresh live objects. Cacheable Protocol Bengal uses a Cacheable protocol to enforce type-safe cache contracts across all cacheable types. This ensures consistent serialization, prevents cache bugs, and enables compile-time validation. Protocol Definition @runtime_checkable class Cacheable(Protocol): """Protocol for types that can be cached to disk.""" def to_cache_dict(self) -> dict[str, Any]: """Return JSON-serializable data only.""" ... @classmethod def from_cache_dict(cls, data: dict[str, Any]) -> Cacheable: """Reconstruct object from data.""" ... Contract Requirements JSON Primitives Only: to_cache_dict() must return only JSON-serializable types (str, int, float, bool, None, list, dict) Type Conversion: Complex types must be converted: datetime → ISO-8601 string (via datetime.isoformat()) Path → str (via str(path)) set → sorted list (for stability) No Object References: Never serialize live objects (Page, Section, Asset). Use stable identifiers (usually string paths) instead. Round-trip Invariant: T.from_cache_dict(obj.to_cache_dict()) must reconstruct an equivalent object (== by fields) Stable Keys: Field names in to_cache_dict() are the contract. Adding/removing fields requires version bump in cache file. Types Implementing Cacheable Type Location Purpose PageCore bengal/core/page/page_core.py Cacheable page metadata (title, date, tags, etc.) TagEntry bengal/cache/taxonomy_index.py Taxonomy index entries IndexEntry bengal/cache/query_index.py Query index entries AssetDependencyEntry bengal/cache/asset_dependency_map.py Asset dependency tracking Example Implementation The following shows a simplified PageCore (the actual implementation has 14+ fields): @dataclass class PageCore(Cacheable): source_path: str title: str date: datetime | None = None tags: list[str] = field(default_factory=list) def to_cache_dict(self) -> dict[str, Any]: """Serialize PageCore to cache-friendly dictionary.""" return { "source_path": self.source_path, "title": self.title, "date": self.date.isoformat() if self.date else None, "tags": self.tags, } @classmethod def from_cache_dict(cls, data: dict[str, Any]) -> PageCore: """Deserialize PageCore from cache dictionary.""" return cls( source_path=data["source_path"], title=data["title"], date=datetime.fromisoformat(data["date"]) if data.get("date") else None, tags=data.get("tags", []), ) See bengal/core/page/page_core.py for the full implementation. Generic CacheStore Helper Bengal provides a generic CacheStore helper for type-safe cache operations: from bengal.cache.cache_store import CacheStore # Type-safe cache operations store = CacheStore[PageCore](cache_path) store.save([page1.core, page2.core]) # List of Cacheable objects entries = store.load() # Returns list[PageCore] Benefits Type Safety: Static type checkers (ty) validate cache contracts at compile time Consistency: All cache entries follow the same serialization pattern Versioning: Built-in version checking for cache invalidation Safety: Prevents accidental pickling of complex objects that might break across versions Performance: Protocol has zero runtime overhead (structural typing) PageCore Serialization With PageCore, cache serialization is type-safe via the Cacheable protocol: # Before: Manual field mapping (error-prone) cache_data = { "source_path": str(page.source_path), "title": page.title, "date": page.date.isoformat() if page.date else None, # ... 10+ more fields } # After: Use to_cache_dict() for correct serialization cache_data = page.core.to_cache_dict() # datetime → ISO-8601, all types handled # Round-trip example loaded = PageCore.from_cache_dict(cache_data) assert loaded.title == page.core.title Note Note Don't use asdict(page.core) directly—it won't convert datetime to ISO-8601 strings. Always use to_cache_dict() for correct serialization. Runtime Validation The @runtime_checkable decorator allows isinstance() checks: from bengal.protocols import Cacheable if isinstance(obj, Cacheable): data = obj.to_cache_dict() # Safe to serialize However, static type checking via ty is the primary validation method. -------------------------------------------------------------------------------- Metadata: - Author: lbliii - Word Count: 1544 - Reading Time: 8 minutes