Module

cache.taxonomy_index

Taxonomy Index for incremental builds.

Maintains persistent index of tag-to-pages mappings to enable incremental taxonomy updates. Instead of rebuilding the entire taxonomy structure, incremental builds can update only affected tags.

Architecture:

  • Mapping: tag_slug → [page_paths] (which pages have which tags)
  • Storage: .bengal/taxonomy_index.json (compact format)
  • Tracking: Built during page discovery, updated on tag changes
  • Incremental: Only update affected tags, reuse unchanged tags

Performance Impact:

  • Taxonomy rebuild skipped for unchanged pages (~60ms saved per 100 pages)
  • Only affected tags regenerated
  • Avoid full taxonomy structure rebuild

Classes

TagEntry dataclass
Entry for a single tag in the index. Implements the Cacheable protocol for type-safe serialization.
4

Entry for a single tag in the index.

Implements the Cacheable protocol for type-safe serialization.

Inherits from Cacheable

Attributes

Name Type Description
tag_slug str
tag_name str
page_paths list[str]
updated_at str
is_valid bool

Methods 4

to_cache_dict
Serialize to cache-friendly dictionary (Cacheable protocol).
0 dict[str, Any]
def to_cache_dict(self) -> dict[str, Any]

Serialize to cache-friendly dictionary (Cacheable protocol).

Returns

dict[str, Any]

from_cache_dict classmethod
Deserialize from cache dictionary (Cacheable protocol).
1 TagEntry
def from_cache_dict(cls, data: dict[str, Any]) -> TagEntry

Deserialize from cache dictionary (Cacheable protocol).

Parameters 1
data dict[str, Any]
Returns

TagEntry

to_dict
Alias for to_cache_dict (test compatibility).
0 dict[str, Any]
def to_dict(self) -> dict[str, Any]

Alias for to_cache_dict (test compatibility).

Returns

dict[str, Any]

from_dict classmethod
Alias for from_cache_dict (test compatibility).
1 TagEntry
def from_dict(cls, data: dict[str, Any]) -> TagEntry

Alias for from_cache_dict (test compatibility).

Parameters 1
data dict[str, Any]
Returns

TagEntry

TaxonomyIndex
Persistent index of tag-to-pages mappings for incremental taxonomy updates. Purpose: - Track which…
17

Persistent index of tag-to-pages mappings for incremental taxonomy updates.

Purpose:

  • Track which pages have which tags
  • Enable incremental tag updates (only changed tags)
  • Avoid full taxonomy rebuild on every page change
  • Support incremental tag page generation

Cache Format (JSON): { "version": 1, "tags": { "python": { "tag_slug": "python", "tag_name": "Python", "page_paths": ["content/post1.md", "content/post2.md"], "updated_at": "2025-10-16T12:00:00", "is_valid": true } } }

Methods 15

save_to_disk
Save taxonomy index to disk.
0 None
def save_to_disk(self) -> None

Save taxonomy index to disk.

update_tag
Update or create a tag entry.
3 None
def update_tag(self, tag_slug: str, tag_name: str, page_paths: list[str]) -> None

Update or create a tag entry.

Parameters 3
tag_slug str

Normalized tag identifier

tag_name str

Original tag name for display

page_paths list[str]

List of page paths with this tag

get_tag
Get a tag entry by slug.
1 TagEntry | None
def get_tag(self, tag_slug: str) -> TagEntry | None

Get a tag entry by slug.

Parameters 1
tag_slug str

Normalized tag identifier

Returns

TagEntry | None

TagEntry if found and valid, None otherwise

get_pages_for_tag
Get pages with a specific tag.
1 list[str] | None
def get_pages_for_tag(self, tag_slug: str) -> list[str] | None

Get pages with a specific tag.

Parameters 1
tag_slug str

Normalized tag identifier

Returns

list[str] | None

List of page paths or None if tag not found/invalid

has_tag
Check if tag exists and is valid.
1 bool
def has_tag(self, tag_slug: str) -> bool

Check if tag exists and is valid.

Parameters 1
tag_slug str

Normalized tag identifier

Returns

bool

True if tag exists and is valid

get_tags_for_page
Get all tags for a specific page (reverse lookup).
1 set[str]
def get_tags_for_page(self, page_path: Path) -> set[str]

Get all tags for a specific page (reverse lookup).

Parameters 1
page_path Path

Path to page

Returns

set[str]

Set of tag slugs for this page

get_all_tags
Get all valid tags.
0 dict[str, TagEntry]
def get_all_tags(self) -> dict[str, TagEntry]

Get all valid tags.

Returns

dict[str, TagEntry]

Dictionary mapping tag_slug to TagEntry for valid tags

invalidate_tag
Mark a tag as invalid.
1 None
def invalidate_tag(self, tag_slug: str) -> None

Mark a tag as invalid.

Parameters 1
tag_slug str

Normalized tag identifier

invalidate_all
Invalidate all tag entries.
0 None
def invalidate_all(self) -> None

Invalidate all tag entries.

clear
Clear all tags.
0 None
def clear(self) -> None

Clear all tags.

remove_page_from_all_tags
Remove a page from all tags it belongs to.
1 set[str]
def remove_page_from_all_tags(self, page_path: Path) -> set[str]

Remove a page from all tags it belongs to.

Parameters 1
page_path Path

Path to page to remove

Returns

set[str]

Set of affected tag slugs

get_valid_entries
Get all valid tag entries.
0 dict[str, TagEntry]
def get_valid_entries(self) -> dict[str, TagEntry]

Get all valid tag entries.

Returns

dict[str, TagEntry]

Dictionary mapping tag_slug to TagEntry for valid entries

get_invalid_entries
Get all invalid tag entries.
0 dict[str, TagEntry]
def get_invalid_entries(self) -> dict[str, TagEntry]

Get all invalid tag entries.

Returns

dict[str, TagEntry]

Dictionary mapping tag_slug to TagEntry for invalid entries

pages_changed
Check if pages for a tag have changed (enabling skipping of unchanged tag regen…
2 bool
def pages_changed(self, tag_slug: str, new_page_paths: list[str]) -> bool

Check if pages for a tag have changed (enabling skipping of unchanged tag regeneration).

This is the key optimization for Phase 2c.2: If a tag's page membership hasn't changed, we can skip regenerating its HTML pages entirely since the output would be identical.

Parameters 2
tag_slug str

Normalized tag identifier

new_page_paths list[str]

New list of page paths for this tag

Returns

bool

True if tag pages have changed and need regeneration False if tag pages are identical to cached version

stats
Get taxonomy index statistics.
0 dict[str, Any]
def stats(self) -> dict[str, Any]

Get taxonomy index statistics.

Returns

dict[str, Any]

Dictionary with index stats

Internal Methods 2
__init__
Initialize taxonomy index.
1 None
def __init__(self, cache_path: Path | None = None)

Initialize taxonomy index.

Parameters 1
cache_path Path | None

Path to cache file (defaults to .bengal/taxonomy_index.json)

_load_from_disk
Load taxonomy index from disk if file exists.
0 None
def _load_from_disk(self) -> None

Load taxonomy index from disk if file exists.