Module

rendering.pipeline.unified_transform

Unified HTML Transform - Optimized content transformation for rendering.

This module provides an optimized HTML transformation approach that combines multiple passes into an efficient sequence with quick rejection checks.

Performance:

Benchmarked at ~27% faster than separate transform calls. See: scripts/benchmark_transforms.py

Architecture:

  • Step 1: Jinja escaping via str.replace() (C-optimized, very fast)
  • Step 2: .md link normalization (single regex pass with quick rejection)
  • Step 3: Internal link baseurl prefixing (single regex pass with quick rejection)

The key optimizations are:

  1. Quick rejection checks before regex operations
  2. Single transformer instance reused across pages
  3. Compiled regex patterns

Related Modules:

  • bengal.rendering.pipeline.transforms: Original separate transforms
  • bengal.rendering.pipeline.core: Uses this transformer
  • bengal.rendering.link_transformer: Link transformation patterns

RFC Reference:

plan/drafted/rfc-rendering-package-optimizations.md

Classes

HybridHTMLTransformer 4
Optimized HTML transformer combining multiple transformation passes. This transformer applies Jinj…

Optimized HTML transformer combining multiple transformation passes.

This transformer applies Jinja escaping and link transformations in an optimized sequence with quick rejection checks to skip unnecessary work.

Creation: transformer = HybridHTMLTransformer(baseurl="/bengal") result = transformer.transform(html)

Thread Safety: Thread-safe. Transformer instances are stateless after initialization and can be safely shared across threads.

Performance: Approximately 27% faster than calling separate transform functions. Improvement is most significant for pages with transformable content.

Methods

transform 1 str
Transform HTML content with optimized multi-pass approach. **Applies transform…
def transform(self, html: str) -> str

Transform HTML content with optimized multi-pass approach.

Applies transformations in sequence:

  1. Jinja block escaping ({%, %})
  2. Markdown link normalization (.md -> /)
  3. Internal link baseurl prefixing (/ -> /baseurl/)

Each step includes quick rejection to skip unnecessary regex work.

Parameters
Name Type Description
html

HTML content to transform

Returns
str Transformed HTML content
Internal Methods 3
__init__ 1
Initialize the transformer.
def __init__(self, baseurl: str = '') -> None
Parameters
Name Type Description
baseurl

Base URL prefix for internal links (e.g., "/bengal"). If empty, internal link transformation is skipped.

Default:''
_md_replacer 1 str
Transform .md link to clean URL, preserving anchors. Handles special cases: - …
def _md_replacer(self, match: re.Match[str]) -> str

Transform .md link to clean URL, preserving anchors.

Handles special cases:

  • ./page.md -> ./page/
  • ./page.md#section -> ./page/#section
  • ./_index.md -> ./
  • ../other.md -> ../other/
  • path/page.md -> path/page/
Parameters
Name Type Description
match
Returns
str
_internal_replacer 1 str
Transform internal link with baseurl prefix. Prepends baseurl to internal link…
def _internal_replacer(self, match: re.Match[str]) -> str

Transform internal link with baseurl prefix.

Prepends baseurl to internal links starting with /. Skips links that already have the baseurl prefix.

Parameters
Name Type Description
match
Returns
str

Functions

create_transformer 1 HybridHTMLTransformer
Create a transformer instance from site config. Factory function that extracts…
def create_transformer(config: dict[str, Any]) -> HybridHTMLTransformer

Create a transformer instance from site config.

Factory function that extracts baseurl from config and creates an appropriately configured transformer.

Parameters
Name Type Description
config dict[str, Any]

Site configuration dictionary

Returns
HybridHTMLTransformer