Classes
MistuneParser
Parser using mistune library.
Faster with full documentation features.
Supported features:
- Table…
MistuneParser
Parser using mistune library. Faster with full documentation features.
Supported features:
- Tables (GFM)
- Fenced code blocks
- Strikethrough
- Task lists
- Autolinks
- TOC generation (custom implementation)
- Admonitions (custom plugin)
- Footnotes (custom plugin)
- Definition lists (custom plugin)
- Variable substitution (custom plugin) - NEW!
BaseMarkdownParserMethods 9
supports_ast
property
Check if this parser supports true AST output.
Mistune natively supports AST o…
supports_ast
property def supports_ast(self) -> bool
Check if this parser supports true AST output.
Mistune natively supports AST output via renderer=None.
Returns
True - Mistune supports AST outputbool
—
parse
Parse Markdown content into HTML.
parse
def parse(self, content: str, metadata: dict[str, Any]) -> str
Parse Markdown content into HTML.
Parameters 2
content |
str |
Markdown content to parse |
metadata |
dict[str, Any] |
Page metadata (includes source path for validation warnings) |
Returns
Rendered HTML stringstr
—
parse_with_toc
Parse Markdown content and extract table of contents.
Two-stage process:
1. Pa…
parse_with_toc
def parse_with_toc(self, content: str, metadata: dict[str, Any]) -> tuple[str, str]
Parse Markdown content and extract table of contents.
Two-stage process:
- Parse markdown to HTML
- Inject heading anchors (IDs and headerlinks)
- Extract TOC from anchored headings
Parameters 2
content |
str |
Markdown content to parse |
metadata |
dict[str, Any] |
Page metadata (includes source path for validation warnings) |
Returns
Tuple of (HTML with anchored headings, TOC HTML)tuple[str, str]
—
parse_with_context
Parse Markdown with variable substitution support.
Variable Substitution:
…
parse_with_context
def parse_with_context(self, content: str, metadata: dict[str, Any], context: dict[str, Any]) -> str
Parse Markdown with variable substitution support.
Variable Substitution:
Enables {{ page.title }}, {{ site.baseurl }}, etc. in markdown content.
Uses a separate mistune instance (_md_with_vars) with preprocessing.
Lazy Initialization:
_md_with_vars is created on first use and cached thereafter.
This happens once per parser instance (i.e., once per thread).
Important: In parallel builds with max_workers=N:
- N parser instances created (main: self.md)
- N variable parser instances created (vars: self._md_with_vars)
- Total: 2N mistune instances, but only 1 of each per thread
- This is optimal - each thread uses its cached instances
Parser Reuse:
The parser with VariableSubstitutionPlugin is cached and reused.
Only the context is updated per page (fast operation).
This avoids expensive parser re-initialization (~10ms) for every page.
Parameters 3
content |
str |
Markdown content to parse |
metadata |
dict[str, Any] |
Page metadata |
context |
dict[str, Any] |
Variable context (page, site, config) |
Returns
Rendered HTML with variables substituted Performance:str
—
parse_with_toc_and_context
Parse Markdown with variable substitution and extract TOC.
Single-pass parsing…
parse_with_toc_and_context
def parse_with_toc_and_context(self, content: str, metadata: dict[str, Any], context: dict[str, Any]) -> tuple[str, str]
Parse Markdown with variable substitution and extract TOC.
Single-pass parsing with VariableSubstitutionPlugin for {{ vars }}.
ARCHITECTURE DECISION: Separation of Concerns
SUPPORTED in markdown content:
- {{ page.metadata.xxx }} - Variable substitution
- {{ site.config.xxx }} - Site configuration access
- Code blocks naturally stay literal (AST-level protection)
NOT SUPPORTED in markdown content:
- {% if %} - Conditional blocks
- {% for %} - Loop constructs
- Complex Jinja2 logic
WHY: These belong in TEMPLATES, not markdown content.
Use conditionals and loops in your page templates:
<!-- templates/page.html -->
<article>
{% if page.metadata.enterprise %}
<div class="enterprise-badge">Enterprise</div>
{% endif %}
{{ content }} <!-- Markdown renders here -->
</article>
This design:
- Keeps parsing simple and fast (single pass)
- Separates content parsing from template logic
- Maintains performance (no preprocessing overhead)
- Makes code blocks work naturally
Parameters 3
content |
str |
Markdown content to parse |
metadata |
dict[str, Any] |
Page metadata |
context |
dict[str, Any] |
Variable context (page, site, config) |
Returns
Tuple of (HTML with anchored headings, TOC HTML)tuple[str, str]
—
enable_cross_references
Enable cross-reference support with [[link]] syntax.
Should be called after co…
enable_cross_references
def enable_cross_references(self, xref_index: dict[str, Any]) -> None
Enable cross-reference support with [[link]] syntax.
Should be called after content discovery when xref_index is built. Creates CrossReferencePlugin for post-processing HTML output.
Also stores xref_index on the renderer for directive access (e.g., cards :pull:).
Performance: O(1) - just stores reference to index Thread-safe: Each thread-local parser instance needs this called once
Parameters 1
xref_index |
dict[str, Any] |
Pre-built cross-reference index from site discovery |
parse_to_ast
Parse Markdown content to AST tokens.
Uses Mistune's built-in AST support by p…
parse_to_ast
def parse_to_ast(self, content: str, metadata: dict[str, Any]) -> list[dict[str, Any]]
Parse Markdown content to AST tokens.
Uses Mistune's built-in AST support by parsing with renderer=None. The AST is a list of token dictionaries representing the document structure.
Performance:
- Parsing cost is similar to parse() (same tokenization)
- AST is more memory-efficient than HTML for caching
- Multiple outputs can be generated from single AST
Parameters 2
content |
str |
Raw Markdown content |
metadata |
dict[str, Any] |
Page metadata (unused, for interface compatibility) |
Returns
List of AST token dictionarieslist[dict[str, Any]]
—
render_ast
Render AST tokens to HTML.
Uses Mistune's renderer to convert AST tokens back …
render_ast
def render_ast(self, ast: list[dict[str, Any]]) -> str
Render AST tokens to HTML.
Uses Mistune's renderer to convert AST tokens back to HTML. This enables parse-once, render-many patterns.
Parameters 1
ast |
list[dict[str, Any]] |
List of AST token dictionaries from parse_to_ast() |
Returns
Rendered HTML stringstr
—
parse_with_ast
Parse content and return AST, HTML, and TOC together.
Single-pass parsing that…
parse_with_ast
def parse_with_ast(self, content: str, metadata: dict[str, Any]) -> tuple[list[dict[str, Any]], str, str]
Parse content and return AST, HTML, and TOC together.
Single-pass parsing that returns all outputs efficiently. Use this when you need both AST (for caching) and HTML (for display).
Parameters 2
content |
str |
Raw Markdown content |
metadata |
dict[str, Any] |
Page metadata |
Returns
Tuple of (AST tokens, HTML content, TOC HTML) Performance:tuple[list[dict[str, Any]], str, str]
—
Internal Methods 6
__init__
Initialize the mistune parser with plugins.
__init__
def __init__(self, enable_highlighting: bool = True) -> None
Initialize the mistune parser with plugins.
Parameters 1
enable_highlighting |
bool |
Enable Pygments syntax highlighting for code blocks (defaults to True for backward compatibility) Parser Instances: This parser is typically created via thread-local caching. With parallel builds (max_workers=N), you'll see N instances created - one per worker thread. This is OPTIMAL, not a bug! Internal Structure: - self.md: Main mistune instance for standard parsing - self._md_with_vars: Created lazily for pages with {{ var }} syntax Both instances share plugins (cross-references, etc.) but have different preprocessing (variable substitution). |
_create_syntax_highlighting_plugin
Create a Mistune plugin that adds Pygments syntax highlighting to code blocks.
_create_syntax_highlighting_plugin
def _create_syntax_highlighting_plugin(self) -> Callable[[Any], None]
Create a Mistune plugin that adds Pygments syntax highlighting to code blocks.
Returns
Plugin function that modifies the renderer to add syntax highlightingCallable[[Any], None]
—
_escape_jinja_blocks
Escape raw Jinja2 block delimiters in HTML content.
This converts "{%"/"%}" in…
_escape_jinja_blocks
def _escape_jinja_blocks(self, html: str) -> str
Escape raw Jinja2 block delimiters in HTML content.
This converts "{%"/"%}" into HTML entities so any documentation examples do not appear as unrendered template syntax in the final HTML.
Parameters 1
html |
str |
Returns
str
_inject_heading_anchors
Inject IDs into heading tags using fast regex (5-10x faster than BS4).
Exclude…
_inject_heading_anchors
def _inject_heading_anchors(self, html: str) -> str
Inject IDs into heading tags using fast regex (5-10x faster than BS4).
Excludes headings inside blockquotes from getting IDs (so they don't appear in TOC).
Single-pass regex replacement handles:
- h2, h3, h4 headings (matching python-markdown's toc_depth)
- Existing IDs (preserves them)
- Heading content with nested HTML
- Generates clean slugs from heading text
- Skips headings inside <blockquote> tags
Parameters 1
html |
str |
HTML content from markdown parser |
Returns
HTML with heading IDs added (except those in blockquotes)str
—
_extract_toc
Extract table of contents from HTML with anchored headings using fast regex (5-…
_extract_toc
def _extract_toc(self, html: str) -> str
Extract table of contents from HTML with anchored headings using fast regex (5-8x faster than BS4).
Builds a nested list of links to heading anchors. Expects headings to have IDs (anchors handled by theme).
Parameters 1
html |
str |
HTML content with heading IDs and headerlinks |
Returns
TOC as HTML (div.toc > ul > li > a structure)str
—
_slugify
Convert text to a URL-friendly slug.
Matches python-markdown's default slugify …
_slugify
def _slugify(self, text: str) -> str
Convert text to a URL-friendly slug. Matches python-markdown's default slugify behavior.
Uses bengal.utils.text.slugify with HTML unescaping enabled. Limits slug length to prevent overly long IDs from headers with code.
Parameters 1
text |
str |
Text to slugify |
Returns
Slugified text (max 100 characters)str
—