Build Cache

How Bengal caches builds for incremental rebuilds

5 min read 983 words

Bengal implements an intelligent caching system that enables sub-second incremental rebuilds.

How It Works

The build cache (.bengal/cache.json.zst) tracks the state of your project to determine exactly what needs to be rebuilt. Cache files are compressed with Zstandard for 92-93% size reduction.

flowchart TD Start[Start Build] --> Load[Load Cache] Load --> Detect[Detect Changes] Detect --> Config{Config Changed?} Config -->|Yes| Full[Full Rebuild] Config -->|No| Hash[Check File Hashes] Hash --> DepGraph[Query Dependency Graph] DepGraph --> Filter[Filter Work] Filter --> Render[Render Affected Pages] Render --> Update[Update Cache] Update --> Save[Save to Disk]

Caching Strategies

Change Detection

We use SHA256 hashing to detect file changes.

  • Content files (.md)
  • Templates (.html,.jinja2)
  • Config files (.toml)
  • Assets (.css,.js)

Impact Analysis

We track relationships to know what to rebuild.

  • Page → Template: Ifpost.htmlchanges, rebuild all blog posts.
  • Tag → Pages: Ifpythontag changes, rebuildtags/python/page.
  • Page → Partial: Ifheader.htmlchanges, rebuild everything.

Taxonomy Lookup

We store an inverted index of tags to avoid parsing all pages.

  • Stored:tag_to_pages['python'] = ['post1.md', 'post2.md']
  • Benefit: O(1) lookup for taxonomy page generation.

Zstandard Compression

Bengal uses Zstandard (zstd) compression for all cache files, leveraging Python 3.14's newcompression.zstdmodule (PEP 784).

Performance Benefits

Metric Before After Improvement
Cache size (773 pages) 1.64 MB 99 KB 94% smaller
Compression ratio 1x 12-14x 12-14x
Cache load time ~5ms ~0.5ms 10x faster
Cache save time ~3ms ~1ms 3x faster

How It Works

flowchart LR Data[Cache Data] --> JSON[JSON Serialize] JSON --> Zstd[Zstd Compress] Zstd --> File[.json.zst File] File2[.json.zst File] --> Decomp[Zstd Decompress] Decomp --> Parse[JSON Parse] Parse --> Data2[Cache Data]

File Format

Cache files use the.json.zstextension:

.bengal/
├── cache.json.zst          # Main build cache (compressed)
├── taxonomy_index.json.zst # Tag/category index (compressed)
├── asset_deps.json.zst     # Asset dependencies (compressed)
└── page_metadata.json.zst  # Page metadata (compressed)

Backward Compatibility

Bengal automatically handles migration:

  1. Read: Tries.json.zstfirst, falls back to.json
  2. Write: Always writes compressed.json.zst
  3. Migration: Old uncompressed caches are read and re-saved as compressed

This means existing projects upgrade seamlessly—no manual migration needed.

CI/CD Benefits

Compressed caches significantly improve CI/CD workflows:

1
2
3
4
5
# GitHub Actions - cache is 16x smaller to transfer
- uses: actions/cache@v4
  with:
    path: .bengal/
    key: bengal-${{ hashFiles('content/**') }}
  • Faster cache upload/download (100KB vs 1.6MB)
  • Lower storage costs
  • Faster build times in CI pipelines

The "No Object References" Rule

Architecture Principle

Never persist object references across builds.

The cache only stores:

  1. File paths (strings)
  2. Hashes (strings)
  3. Simple metadata (dicts/lists)

This ensures cache stability. When a build starts, we load the cache and reconstruct the relationships with fresh live objects.

Cacheable Protocol

Bengal uses aCacheableprotocol to enforce type-safe cache contracts across all cacheable types. This ensures consistent serialization, prevents cache bugs, and enables compile-time validation.

Protocol Definition

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
@runtime_checkable
class Cacheable(Protocol):
    """Protocol for types that can be cached to disk."""

    def to_cache_dict(self) -> dict[str, Any]:
        """Return JSON-serializable data only."""
        ...

    @classmethod
    def from_cache_dict(cls, data: dict[str, Any]) -> Cacheable:
        """Reconstruct object from data."""
        ...

Contract Requirements

  1. JSON Primitives Only:to_cache_dict()must return only JSON-serializable types (str, int, float, bool, None, list, dict)
  2. Type Conversion: Complex types must be converted:
    • datetime→ ISO-8601 string (viadatetime.isoformat())
    • Path→ str (viastr(path))
    • set→ sorted list (for stability)
  3. No Object References: Never serialize live objects (Page, Section, Asset). Use stable identifiers (usually string paths) instead.
  4. Round-trip Invariant:T.from_cache_dict(obj.to_cache_dict())must reconstruct an equivalent object (== by fields)
  5. Stable Keys: Field names into_cache_dict()are the contract. Adding/removing fields requires version bump in cache file.

Types Implementing Cacheable

Type Location Purpose
PageCore bengal/core/page/page_core.py Cacheable page metadata (title, date, tags, etc.)
TagEntry bengal/cache/taxonomy_index.py Taxonomy index entries
IndexEntry bengal/cache/query_index.py Query index entries
AssetDependencyEntry bengal/cache/asset_dependency_map.py Asset dependency tracking

Example Implementation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
@dataclass
class PageCore(Cacheable):
    source_path: str
    title: str
    date: datetime | None = None
    tags: list[str] = field(default_factory=list)

    def to_cache_dict(self) -> dict[str, Any]:
        """Serialize PageCore to cache-friendly dictionary."""
        return {
            "source_path": self.source_path,
            "title": self.title,
            "date": self.date.isoformat() if self.date else None,
            "tags": self.tags,
        }

    @classmethod
    def from_cache_dict(cls, data: dict[str, Any]) -> PageCore:
        """Deserialize PageCore from cache dictionary."""
        return cls(
            source_path=data["source_path"],
            title=data["title"],
            date=datetime.fromisoformat(data["date"]) if data.get("date") else None,
            tags=data.get("tags", []),
        )

Generic CacheStore Helper

Bengal provides a genericCacheStorehelper for type-safe cache operations:

1
2
3
4
5
6
from bengal.cache.cache_store import CacheStore

# Type-safe cache operations
store = CacheStore[PageCore](cache_path)
store.save([page1.core, page2.core])  # List of Cacheable objects
entries = store.load()  # Returns list[PageCore]

Benefits

  • Type Safety: Static type checkers (mypy) validate cache contracts at compile time
  • Consistency: All cache entries follow the same serialization pattern
  • Versioning: Built-in version checking for cache invalidation
  • Safety: Prevents accidental pickling of complex objects that might break across versions
  • Performance: Protocol has zero runtime overhead (structural typing)

PageCore Serialization

With PageCore, cache serialization is simplified:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Before: Manual field mapping (error-prone)
cache_data = {
    "source_path": str(page.source_path),
    "title": page.title,
    "date": page.date.isoformat() if page.date else None,
    # ... 10+ more fields
}

# After: Single line using PageCore
from dataclasses import asdict
cache_data = asdict(page.core)  # All cacheable fields serialized

Runtime Validation

The@runtime_checkabledecorator allowsisinstance()checks:

1
2
3
4
5
from bengal.cache.cacheable import Cacheable

if isinstance(obj, Cacheable):
    data = obj.to_cache_dict()
    # Safe to serialize

However, static type checking via mypy is the primary validation method.

See:bengal/cache/cacheable.pyfor full protocol definition and examples.