Module

rendering.pygments_cache

Pygments lexer caching to dramatically improve syntax highlighting performance.

Problem: pygments.lexers.guess_lexer() triggers expensive plugin discovery via importlib.metadata on EVERY code block, causing 60+ seconds overhead on large sites with many code blocks.

Solution: Cache lexers by language name to avoid repeated plugin discovery.

Performance Impact (measured on 826-page site):

  • Before: 86s (73% in Pygments plugin discovery)
  • After: ~29s (3× faster)

Functions

_normalize_language
Normalize a requested language to a Pygments-friendly name. Applies alias mapping and lowercases t…
1 str
def _normalize_language(language: str) -> str

Normalize a requested language to a Pygments-friendly name.

Applies alias mapping and lowercases the language name. Strips file paths if language identifier includes colon (e.g., 'jinja2:path/to/file.html' -> 'jinja2').

Parameters 1

Name Type Default Description
language str

Returns

str

get_lexer_cached
Get a Pygments lexer with aggressive caching. Strategy: 1. If language specified: cache by languag…
2 Any
def get_lexer_cached(language: str | None = None, code: str = '') -> Any

Get a Pygments lexer with aggressive caching.

Strategy:

  1. If language specified: cache by language name (fast path)
  2. If no language: hash code sample and cache guess result
  3. Fallback: return text lexer if all else fails

Parameters 2

Name Type Default Description
language str | None None

Optional language name (e.g., 'python', 'javascript')

code str ''

Code content (used for guessing if language not specified)

Returns

Any

Pygments lexer instance

Performance:

  • Cached lookup: ~0.001ms
  • Uncached lookup: ~30ms (plugin discovery)
  • Cache hit rate: >95% after first few pages

clear_cache
Clear the lexer cache. Useful for testing or memory management.
0 None
def clear_cache() -> None

Clear the lexer cache. Useful for testing or memory management.

get_cache_stats
Get cache statistics for monitoring.
0 dict[str, int | float]
def get_cache_stats() -> dict[str, int | float]

Get cache statistics for monitoring.

Returns

dict[str, int | float]

Dict with hits, misses, guess_calls, hit_rate

log_cache_stats
Log cache statistics. Call at end of build for visibility.
0 None
def log_cache_stats() -> None

Log cache statistics. Call at end of build for visibility.