Classes

MistuneParser

Parser using mistune library. Faster with full documentation features. Supported features: - Table…

Parser using mistune library. Faster with full documentation features.

Supported features:

Tables (GFM)
Fenced code blocks
Strikethrough
Task lists
Autolinks
TOC generation (custom implementation)
Admonitions (custom plugin)
Footnotes (custom plugin)
Definition lists (custom plugin)
Variable substitution (custom plugin) - NEW!

Inherits from BaseMarkdownParser

Methods 9

supports_ast property

Check if this parser supports true AST output. Mistune natively supports AST o…

bool

def supports_ast(self) -> bool

Check if this parser supports true AST output.

Mistune natively supports AST output via renderer=None.

Returns

bool —

True - Mistune supports AST output

parse

Parse Markdown content into HTML.

2 str

def parse(self, content: str, metadata: dict[str, Any]) -> str

Parse Markdown content into HTML.

Parameters 2

`content`	`str`	Markdown content to parse
`metadata`	`dict[str, Any]`	Page metadata (includes source path for validation warnings)

Returns

str —

Rendered HTML string

parse_with_toc

Parse Markdown content and extract table of contents. Two-stage process: 1. Pa…

2 tuple[str, str]

def parse_with_toc(self, content: str, metadata: dict[str, Any]) -> tuple[str, str]

Parse Markdown content and extract table of contents.

Two-stage process:

Parse markdown to HTML
Inject heading anchors (IDs and headerlinks)
Extract TOC from anchored headings

Parameters 2

`content`	`str`	Markdown content to parse
`metadata`	`dict[str, Any]`	Page metadata (includes source path for validation warnings)

Returns

tuple[str, str] —

Tuple of (HTML with anchored headings, TOC HTML)

parse_with_context

Parse Markdown with variable substitution support. Variable Substitution: …

3 str

def parse_with_context(self, content: str, metadata: dict[str, Any], context: dict[str, Any]) -> str

Parse Markdown with variable substitution support.

Variable Substitution:

Enables {{ page.title }}, {{ site.baseurl }}, etc. in markdown content.
Uses a separate mistune instance (_md_with_vars) with preprocessing.

Lazy Initialization:

_md_with_vars is created on first use and cached thereafter.
This happens once per parser instance (i.e., once per thread).

Important: In parallel builds with max_workers=N:

N parser instances created (main: self.md)
N variable parser instances created (vars: self._md_with_vars)
Total: 2N mistune instances, but only 1 of each per thread
This is optimal - each thread uses its cached instances

Parser Reuse:

The parser with VariableSubstitutionPlugin is cached and reused.
Only the context is updated per page (fast operation).
This avoids expensive parser re-initialization (~10ms) for every page.

Parameters 3

`content`	`str`	Markdown content to parse
`metadata`	`dict[str, Any]`	Page metadata
`context`	`dict[str, Any]`	Variable context (page, site, config)

Returns

str —

Rendered HTML with variables substituted

Performance:

First call (per thread): Creates _md_with_vars (~10ms)
Subsequent calls: Reuses cached parser (~0ms overhead)
Variable preprocessing: ~0.5ms per page
Markdown parsing: ~1-5ms per page

parse_with_toc_and_context

Parse Markdown with variable substitution and extract TOC. Single-pass parsing…

3 tuple[str, str]

def parse_with_toc_and_context(self, content: str, metadata: dict[str, Any], context: dict[str, Any]) -> tuple[str, str]

Parse Markdown with variable substitution and extract TOC.

Single-pass parsing with VariableSubstitutionPlugin for {{ vars }}.

ARCHITECTURE DECISION: Separation of Concerns

SUPPORTED in markdown content:

{{ page.metadata.xxx }} - Variable substitution
{{ site.config.xxx }} - Site configuration access
Code blocks naturally stay literal (AST-level protection)

NOT SUPPORTED in markdown content:

{% if %} - Conditional blocks
{% for %} - Loop constructs
Complex Jinja2 logic

WHY: These belong in TEMPLATES, not markdown content.

Use conditionals and loops in your page templates:

<!-- templates/page.html -->
<article>
  {% if page.metadata.enterprise %}
  <div class="enterprise-badge">Enterprise</div>
  {% endif %}

  {{ content }}  <!-- Markdown renders here -->
</article>

This design:

Keeps parsing simple and fast (single pass)
Separates content parsing from template logic
Maintains performance (no preprocessing overhead)
Makes code blocks work naturally

Parameters 3

`content`	`str`	Markdown content to parse
`metadata`	`dict[str, Any]`	Page metadata
`context`	`dict[str, Any]`	Variable context (page, site, config)

Returns

tuple[str, str] —

Tuple of (HTML with anchored headings, TOC HTML)

enable_cross_references

Enable cross-reference support with [[link]] syntax. Should be called after co…

1 None

def enable_cross_references(self, xref_index: dict[str, Any]) -> None

Enable cross-reference support with [[link]] syntax.

Should be called after content discovery when xref_index is built. Creates CrossReferencePlugin for post-processing HTML output.

Also stores xref_index on the renderer for directive access (e.g., cards :pull:).

Performance: O(1) - just stores reference to index Thread-safe: Each thread-local parser instance needs this called once

Parameters 1

xref_index

dict[str, Any]

Pre-built cross-reference index from site discovery

parse_to_ast

Parse Markdown content to AST tokens. Uses Mistune's built-in AST support by p…

2 list[dict[str, Any]]

def parse_to_ast(self, content: str, metadata: dict[str, Any]) -> list[dict[str, Any]]

Parse Markdown content to AST tokens.

Uses Mistune's built-in AST support by parsing with renderer=None. The AST is a list of token dictionaries representing the document structure.

Performance:

Parsing cost is similar to parse() (same tokenization)
AST is more memory-efficient than HTML for caching
Multiple outputs can be generated from single AST

Parameters 2

`content`	`str`	Raw Markdown content
`metadata`	`dict[str, Any]`	Page metadata (unused, for interface compatibility)

Returns

list[dict[str, Any]] —

List of AST token dictionaries

render_ast

Render AST tokens to HTML. Uses Mistune's renderer to convert AST tokens back …

1 str

def render_ast(self, ast: list[dict[str, Any]]) -> str

Render AST tokens to HTML.

Uses Mistune's renderer to convert AST tokens back to HTML. This enables parse-once, render-many patterns.

Parameters 1

ast

list[dict[str, Any]]

List of AST token dictionaries from parse_to_ast()

Returns

str —

Rendered HTML string

parse_with_ast

Parse content and return AST, HTML, and TOC together. Single-pass parsing that…

2 tuple[list[dict[str…

def parse_with_ast(self, content: str, metadata: dict[str, Any]) -> tuple[list[dict[str, Any]], str, str]

Parse content and return AST, HTML, and TOC together.

Single-pass parsing that returns all outputs efficiently. Use this when you need both AST (for caching) and HTML (for display).

Parameters 2

`content`	`str`	Raw Markdown content
`metadata`	`dict[str, Any]`	Page metadata

Returns

tuple[list[dict[str, Any]], str, str] —

Tuple of (AST tokens, HTML content, TOC HTML)

Performance:

Single parse pass for AST
Single render pass for HTML
TOC extracted from HTML (fast regex)
~30% overhead vs parse() alone, but saves re-parsing

Internal Methods 6

__init__

Initialize the mistune parser with plugins.

1 None

def __init__(self, enable_highlighting: bool = True) -> None

Initialize the mistune parser with plugins.

Parameters 1

enable_highlighting

bool

Enable Pygments syntax highlighting for code blocks (defaults to True for backward compatibility) Parser Instances: This parser is typically created via thread-local caching. With parallel builds (max_workers=N), you'll see N instances created - one per worker thread. This is OPTIMAL, not a bug! Internal Structure: - self.md: Main mistune instance for standard parsing - self._md_with_vars: Created lazily for pages with {{ var }} syntax Both instances share plugins (cross-references, etc.) but have different preprocessing (variable substitution).

_create_syntax_highlighting_plugin

Create a Mistune plugin that adds Pygments syntax highlighting to code blocks.

0 Callable[[Any], None]

def _create_syntax_highlighting_plugin(self) -> Callable[[Any], None]

Create a Mistune plugin that adds Pygments syntax highlighting to code blocks.

Returns

Callable[[Any], None] —

Plugin function that modifies the renderer to add syntax highlighting

_escape_jinja_blocks

Escape raw Jinja2 block delimiters in HTML content. This converts "{%"/"%}" in…

1 str

def _escape_jinja_blocks(self, html: str) -> str

Escape raw Jinja2 block delimiters in HTML content.

This converts "{%"/"%}" into HTML entities so any documentation examples do not appear as unrendered template syntax in the final HTML.

Parameters 1

html str

Returns

str

_inject_heading_anchors

Inject IDs into heading tags using fast regex (5-10x faster than BS4). Exclude…

1 str

def _inject_heading_anchors(self, html: str) -> str

Inject IDs into heading tags using fast regex (5-10x faster than BS4).

Excludes headings inside blockquotes from getting IDs (so they don't appear in TOC).

Single-pass regex replacement handles:

h2, h3, h4 headings (matching python-markdown's toc_depth)
Existing IDs (preserves them)
Heading content with nested HTML
Generates clean slugs from heading text
Skips headings inside <blockquote> tags

Parameters 1

html

str

HTML content from markdown parser

Returns

str —

HTML with heading IDs added (except those in blockquotes)

_extract_toc

Extract table of contents from HTML with anchored headings using fast regex (5-…

1 str

def _extract_toc(self, html: str) -> str

Extract table of contents from HTML with anchored headings using fast regex (5-8x faster than BS4).

Builds a nested list of links to heading anchors. Expects headings to have IDs (anchors handled by theme).

Parameters 1

html

str

HTML content with heading IDs and headerlinks

Returns

str —

TOC as HTML (div.toc > ul > li > a structure)

_slugify

Convert text to a URL-friendly slug. Matches python-markdown's default slugify …

1 str

def _slugify(self, text: str) -> str

Convert text to a URL-friendly slug. Matches python-markdown's default slugify behavior.

Uses bengal.utils.text.slugify with HTML unescaping enabled. Limits slug length to prevent overly long IDs from headers with code.

Parameters 1

text

str

Text to slugify

Returns

str —

Slugified text (max 100 characters)

rendering.parsers.mistune

Classes

Methods 9

Returns

Parameters 2

Returns

Parameters 2

Returns

Parameters 3

Returns

ARCHITECTURE DECISION: Separation of Concerns

Parameters 3

Returns

Parameters 1

Parameters 2

Returns

Parameters 1

Returns

Parameters 2

Returns

Parameters 1

Returns

Parameters 1

Returns

Parameters 1

Returns

Parameters 1

Returns

Parameters 1

Returns

`rendering.parsers.mistune`