Module

rendering.parsers.mistune

Mistune parser implementation - fast with full documentation features.

Classes

MistuneParser
Parser using mistune library. Faster with full documentation features. Supported features: - Table…
15

Parser using mistune library. Faster with full documentation features.

Supported features:

  • Tables (GFM)
  • Fenced code blocks
  • Strikethrough
  • Task lists
  • Autolinks
  • TOC generation (custom implementation)
  • Admonitions (custom plugin)
  • Footnotes (custom plugin)
  • Definition lists (custom plugin)
  • Variable substitution (custom plugin) - NEW!
Inherits from BaseMarkdownParser

Methods 9

supports_ast property
Check if this parser supports true AST output. Mistune natively supports AST o…
bool
def supports_ast(self) -> bool

Check if this parser supports true AST output.

Mistune natively supports AST output via renderer=None.

Returns

bool

True - Mistune supports AST output

parse
Parse Markdown content into HTML.
2 str
def parse(self, content: str, metadata: dict[str, Any]) -> str

Parse Markdown content into HTML.

Parameters 2
content str

Markdown content to parse

metadata dict[str, Any]

Page metadata (includes source path for validation warnings)

Returns

str

Rendered HTML string

parse_with_toc
Parse Markdown content and extract table of contents. Two-stage process: 1. Pa…
2 tuple[str, str]
def parse_with_toc(self, content: str, metadata: dict[str, Any]) -> tuple[str, str]

Parse Markdown content and extract table of contents.

Two-stage process:

  1. Parse markdown to HTML
  2. Inject heading anchors (IDs and headerlinks)
  3. Extract TOC from anchored headings
Parameters 2
content str

Markdown content to parse

metadata dict[str, Any]

Page metadata (includes source path for validation warnings)

Returns

tuple[str, str]

Tuple of (HTML with anchored headings, TOC HTML)

parse_with_context
Parse Markdown with variable substitution support. Variable Substitution: …
3 str
def parse_with_context(self, content: str, metadata: dict[str, Any], context: dict[str, Any]) -> str

Parse Markdown with variable substitution support.

Variable Substitution:

Enables {{ page.title }}, {{ site.baseurl }}, etc. in markdown content.
Uses a separate mistune instance (_md_with_vars) with preprocessing.

Lazy Initialization:

_md_with_vars is created on first use and cached thereafter.
This happens once per parser instance (i.e., once per thread).

Important: In parallel builds with max_workers=N:
  • N parser instances created (main: self.md)
  • N variable parser instances created (vars: self._md_with_vars)
  • Total: 2N mistune instances, but only 1 of each per thread
  • This is optimal - each thread uses its cached instances

Parser Reuse:

The parser with VariableSubstitutionPlugin is cached and reused.
Only the context is updated per page (fast operation).
This avoids expensive parser re-initialization (~10ms) for every page.
Parameters 3
content str

Markdown content to parse

metadata dict[str, Any]

Page metadata

context dict[str, Any]

Variable context (page, site, config)

Returns

str

Rendered HTML with variables substituted

Performance:

  • First call (per thread): Creates _md_with_vars (~10ms)
  • Subsequent calls: Reuses cached parser (~0ms overhead)
  • Variable preprocessing: ~0.5ms per page
  • Markdown parsing: ~1-5ms per page

parse_with_toc_and_context
Parse Markdown with variable substitution and extract TOC. Single-pass parsing…
3 tuple[str, str]
def parse_with_toc_and_context(self, content: str, metadata: dict[str, Any], context: dict[str, Any]) -> tuple[str, str]

Parse Markdown with variable substitution and extract TOC.

Single-pass parsing with VariableSubstitutionPlugin for {{ vars }}.

ARCHITECTURE DECISION: Separation of Concerns

SUPPORTED in markdown content:

  • {{ page.metadata.xxx }} - Variable substitution
  • {{ site.config.xxx }} - Site configuration access
  • Code blocks naturally stay literal (AST-level protection)

NOT SUPPORTED in markdown content:

  • {% if %} - Conditional blocks
  • {% for %} - Loop constructs
  • Complex Jinja2 logic

WHY: These belong in TEMPLATES, not markdown content.

Use conditionals and loops in your page templates:

<!-- templates/page.html -->
<article>
  {% if page.metadata.enterprise %}
  <div class="enterprise-badge">Enterprise</div>
  {% endif %}

  {{ content }}  <!-- Markdown renders here -->
</article>

This design:

  • Keeps parsing simple and fast (single pass)
  • Separates content parsing from template logic
  • Maintains performance (no preprocessing overhead)
  • Makes code blocks work naturally
Parameters 3
content str

Markdown content to parse

metadata dict[str, Any]

Page metadata

context dict[str, Any]

Variable context (page, site, config)

Returns

tuple[str, str]

Tuple of (HTML with anchored headings, TOC HTML)

enable_cross_references
Enable cross-reference support with [[link]] syntax. Should be called after co…
1 None
def enable_cross_references(self, xref_index: dict[str, Any]) -> None

Enable cross-reference support with [[link]] syntax.

Should be called after content discovery when xref_index is built. Creates CrossReferencePlugin for post-processing HTML output.

Also stores xref_index on the renderer for directive access (e.g., cards :pull:).

Performance: O(1) - just stores reference to index Thread-safe: Each thread-local parser instance needs this called once

Parameters 1
xref_index dict[str, Any]

Pre-built cross-reference index from site discovery

parse_to_ast
Parse Markdown content to AST tokens. Uses Mistune's built-in AST support by p…
2 list[dict[str, Any]]
def parse_to_ast(self, content: str, metadata: dict[str, Any]) -> list[dict[str, Any]]

Parse Markdown content to AST tokens.

Uses Mistune's built-in AST support by parsing with renderer=None. The AST is a list of token dictionaries representing the document structure.

Performance:

  • Parsing cost is similar to parse() (same tokenization)
  • AST is more memory-efficient than HTML for caching
  • Multiple outputs can be generated from single AST
Parameters 2
content str

Raw Markdown content

metadata dict[str, Any]

Page metadata (unused, for interface compatibility)

Returns

list[dict[str, Any]]

List of AST token dictionaries

render_ast
Render AST tokens to HTML. Uses Mistune's renderer to convert AST tokens back …
1 str
def render_ast(self, ast: list[dict[str, Any]]) -> str

Render AST tokens to HTML.

Uses Mistune's renderer to convert AST tokens back to HTML. This enables parse-once, render-many patterns.

Parameters 1
ast list[dict[str, Any]]

List of AST token dictionaries from parse_to_ast()

Returns

str

Rendered HTML string

parse_with_ast
Parse content and return AST, HTML, and TOC together. Single-pass parsing that…
2 tuple[list[dict[str…
def parse_with_ast(self, content: str, metadata: dict[str, Any]) -> tuple[list[dict[str, Any]], str, str]

Parse content and return AST, HTML, and TOC together.

Single-pass parsing that returns all outputs efficiently. Use this when you need both AST (for caching) and HTML (for display).

Parameters 2
content str

Raw Markdown content

metadata dict[str, Any]

Page metadata

Returns

tuple[list[dict[str, Any]], str, str]

Tuple of (AST tokens, HTML content, TOC HTML)

Performance:

  • Single parse pass for AST
  • Single render pass for HTML
  • TOC extracted from HTML (fast regex)
  • ~30% overhead vs parse() alone, but saves re-parsing

Internal Methods 6
__init__
Initialize the mistune parser with plugins.
1 None
def __init__(self, enable_highlighting: bool = True) -> None

Initialize the mistune parser with plugins.

Parameters 1
enable_highlighting bool

Enable Pygments syntax highlighting for code blocks (defaults to True for backward compatibility) Parser Instances: This parser is typically created via thread-local caching. With parallel builds (max_workers=N), you'll see N instances created - one per worker thread. This is OPTIMAL, not a bug! Internal Structure: - self.md: Main mistune instance for standard parsing - self._md_with_vars: Created lazily for pages with {{ var }} syntax Both instances share plugins (cross-references, etc.) but have different preprocessing (variable substitution).

_create_syntax_highlighting_plugin
Create a Mistune plugin that adds Pygments syntax highlighting to code blocks.
0 Callable[[Any], None]
def _create_syntax_highlighting_plugin(self) -> Callable[[Any], None]

Create a Mistune plugin that adds Pygments syntax highlighting to code blocks.

Returns

Callable[[Any], None]

Plugin function that modifies the renderer to add syntax highlighting

_escape_jinja_blocks
Escape raw Jinja2 block delimiters in HTML content. This converts "{%"/"%}" in…
1 str
def _escape_jinja_blocks(self, html: str) -> str

Escape raw Jinja2 block delimiters in HTML content.

This converts "{%"/"%}" into HTML entities so any documentation examples do not appear as unrendered template syntax in the final HTML.

Parameters 1
html str
Returns

str

_inject_heading_anchors
Inject IDs into heading tags using fast regex (5-10x faster than BS4). Exclude…
1 str
def _inject_heading_anchors(self, html: str) -> str

Inject IDs into heading tags using fast regex (5-10x faster than BS4).

Excludes headings inside blockquotes from getting IDs (so they don't appear in TOC).

Single-pass regex replacement handles:

  • h2, h3, h4 headings (matching python-markdown's toc_depth)
  • Existing IDs (preserves them)
  • Heading content with nested HTML
  • Generates clean slugs from heading text
  • Skips headings inside <blockquote> tags
Parameters 1
html str

HTML content from markdown parser

Returns

str

HTML with heading IDs added (except those in blockquotes)

_extract_toc
Extract table of contents from HTML with anchored headings using fast regex (5-…
1 str
def _extract_toc(self, html: str) -> str

Extract table of contents from HTML with anchored headings using fast regex (5-8x faster than BS4).

Builds a nested list of links to heading anchors. Expects headings to have IDs (anchors handled by theme).

Parameters 1
html str

HTML content with heading IDs and headerlinks

Returns

str

TOC as HTML (div.toc > ul > li > a structure)

_slugify
Convert text to a URL-friendly slug. Matches python-markdown's default slugify …
1 str
def _slugify(self, text: str) -> str

Convert text to a URL-friendly slug. Matches python-markdown's default slugify behavior.

Uses bengal.utils.text.slugify with HTML unescaping enabled. Limits slug length to prevent overly long IDs from headers with code.

Parameters 1
text str

Text to slugify

Returns

str

Slugified text (max 100 characters)