Module

orchestration.taxonomy

Taxonomy orchestration for Bengal SSG.

Handles taxonomy collection (tags, categories) and dynamic page generation (tag pages, archive pages, etc.).

Classes

TaxonomyOrchestrator
Handles taxonomies and dynamic page generation. Responsibilities: - Collect tags, categories, …
14

Handles taxonomies and dynamic page generation.

Responsibilities:

  • Collect tags, categories, and other taxonomies
  • Generate tag index pages
  • Generate individual tag pages (with pagination)

Note: Section archive pages are now handled by SectionOrchestrator

Methods 7

collect_and_generate
Collect taxonomies and generate dynamic pages. Main entry point called during build.
1 None
def collect_and_generate(self, parallel: bool = True) -> None

Collect taxonomies and generate dynamic pages. Main entry point called during build.

Parameters 1
parallel bool

Whether to use parallel processing for tag page generation

collect_and_generate_incremental
Incrementally update taxonomies for changed pages only. Architecture: 1. Only …
2 set[str]
def collect_and_generate_incremental(self, changed_pages: list[Page], cache: BuildCache) -> set[str]

Incrementally update taxonomies for changed pages only.

Architecture:

  1. Only rebuild site.taxonomies from current Page objects when tags actually changed
  2. Use cache to determine which tag PAGES need regeneration (fast)
  3. Never reuse taxonomy structure with object references (prevents bugs)

Performance:

  • Change detection: O(changed pages)
  • Taxonomy reconstruction: O(all tags * pages_per_tag) ≈ O(all pages) but ONLY when tags changed
  • Tag page generation: O(affected tags)
Parameters 2
changed_pages list[Page]

List of pages that changed (NOT generated pages)

cache BuildCache

Build cache with tag index

Returns

set[str]

Set of affected tag slugs (for regenerating tag pages)

collect_taxonomies
Collect taxonomies (tags, categories, etc.) from all pages. Organizes pages by …
0 None
def collect_taxonomies(self) -> None

Collect taxonomies (tags, categories, etc.) from all pages. Organizes pages by their taxonomic terms.

generate_dynamic_pages_for_tags
Generate dynamic pages only for specific affected tags (incremental optimizatio…
1 None
def generate_dynamic_pages_for_tags(self, affected_tags: set[str]) -> None

Generate dynamic pages only for specific affected tags (incremental optimization).

This method supports i18n - it generates per-locale tag pages when i18n is enabled.

Parameters 1
affected_tags set[str]

Set of tag slugs that need page regeneration

generate_dynamic_pages_for_tags_with_cache
Generate dynamic pages only for specific affected tags with TaxonomyIndex optim…
2 None
def generate_dynamic_pages_for_tags_with_cache(self, affected_tags: set[str], taxonomy_index: TaxonomyIndex | None = None) -> None

Generate dynamic pages only for specific affected tags with TaxonomyIndex optimization (Phase 2c.2).

This enhanced version uses TaxonomyIndex to skip regenerating tags whose page membership hasn't changed, providing ~160ms savings per incremental build for typical sites.

Parameters 2
affected_tags set[str]

Set of tag slugs that need page regeneration

taxonomy_index TaxonomyIndex | None

Optional TaxonomyIndex for skipping unchanged tags

generate_dynamic_pages
Generate dynamic taxonomy pages (tag pages, etc.) that don't have source files.…
1 None
def generate_dynamic_pages(self, parallel: bool = True) -> None

Generate dynamic taxonomy pages (tag pages, etc.) that don't have source files.

Note: Section archive pages are now generated by SectionOrchestrator

Parameters 1
parallel bool

Whether to use parallel processing for tag pages (default: True)

generate_tag_pages
3 list[Page]
def generate_tag_pages(self, tags: list[str], selective: bool = False, context: BuildContext | None = None) -> list[Page]
Parameters 3
tags list[str]
selective bool
context BuildContext | None
Returns

list[Page]

Internal Methods 7
__init__
Initialize taxonomy orchestrator.
3 None
def __init__(self, site: Site, threshold: int = 20, parallel: bool = True)

Initialize taxonomy orchestrator.

Parameters 3
site Site

Site instance containing pages and sections

threshold int
parallel bool
_is_eligible_for_taxonomy
Check if a page is eligible for taxonomy collection. Excludes: - Generated pag…
1 bool
def _is_eligible_for_taxonomy(self, page: Page) -> bool

Check if a page is eligible for taxonomy collection.

Excludes:

  • Generated pages (tag pages, archive pages, etc.)
  • Pages from autodoc output directories (content/api, content/cli) Note: Autodoc pages typically don't have tags, but this prevents them from being included if someone manually adds tags.
Parameters 1
page Page

Page to check

Returns

bool

True if page should be included in taxonomies

_rebuild_taxonomy_structure_from_cache
Rebuild site.taxonomies from cache using CURRENT Page objects. This is the key…
1 None
def _rebuild_taxonomy_structure_from_cache(self, cache: BuildCache) -> None

Rebuild site.taxonomies from cache using CURRENT Page objects.

This is the key to avoiding stale references:

  1. Cache tells us which pages have which tags (paths only)
  2. We map paths to current Page objects (from site.pages)
  3. We reconstruct taxonomy dict with current objects

Performance: O(tags * pages_per_tag) which is O(all pages) worst case, but in practice very fast because it's just dict lookups and list appends.

CRITICAL: This always uses current Page objects, never cached references.

Parameters 1
cache BuildCache
_generate_tag_pages_sequential
Generate tag pages sequentially (original implementation).
2 int
def _generate_tag_pages_sequential(self, locale_tags: dict[str, Any], lang: str) -> int

Generate tag pages sequentially (original implementation).

Parameters 2
locale_tags dict[str, Any]

Dictionary of tag slugs to tag data

lang str

Language code

Returns

int

Number of pages generated

_generate_tag_pages_parallel
Generate tag pages in parallel using ThreadPoolExecutor. Each tag's pages can …
2 int
def _generate_tag_pages_parallel(self, locale_tags: dict[str, Any], lang: str) -> int

Generate tag pages in parallel using ThreadPoolExecutor.

Each tag's pages can be generated independently, making this perfectly parallelizable. On Python 3.14t (free-threaded), this achieves true parallelism without GIL contention.

Performance:

  • Python 3.13 (GIL): 2-3x faster
  • Python 3.14t (no GIL): 6-8x faster
Parameters 2
locale_tags dict[str, Any]

Dictionary of tag slugs to tag data

lang str

Language code

Returns

int

Number of pages generated

_create_tag_index_page
Create the main tags index page.
0 Page
def _create_tag_index_page(self) -> Page

Create the main tags index page.

Returns

Page

Generated tag index page

_create_tag_pages
Create pages for an individual tag (with pagination if needed).
2 list[Page]
def _create_tag_pages(self, tag_slug: str, tag_data: dict[str, Any]) -> list[Page]

Create pages for an individual tag (with pagination if needed).

Parameters 2
tag_slug str

URL-safe tag slug

tag_data dict[str, Any]

Dictionary containing tag name and pages

Returns

list[Page]

List of generated tag pages