API Reference

Functions, classes, and modules

8 min read 1700 words

Core API for parsing and rendering Markdown.

High-Level API

parse()

Parse Markdown source into a typed AST.

def parse(
    source: str,
    *,
    source_file: str | None = None,
    directive_registry: DirectiveRegistry | None = None,
    cache: ParseCache | None = None,
) -> Document

Parameters:

  • source: Markdown source text
  • source_file: Optional source file path for error messages
  • directive_registry: Custom directive registry (uses defaults if None)
  • cache: Optional content-addressed parse cache (see Parse Cache)

Returns: Document AST root node

Example:

from patitas import parse

doc = parse("# Hello **World**")
print(doc.children[0])  # Heading(level=1, ...)

# With parse cache (faster incremental builds)
from patitas import parse, DictParseCache
cache = DictParseCache()
doc1 = parse("# Hello", cache=cache)
doc2 = parse("# Hello", cache=cache)  # Cache hit, no re-parse

render()

Render a Patitas AST to HTML.

def render(
    doc: Document,
    *,
    source: str = "",
    highlight: bool = False,
    directive_registry: DirectiveRegistry | None = None,
) -> str

Parameters:

  • doc: Document AST to render
  • source: Original Markdown source for zero-copy extraction
  • highlight: Enable syntax highlighting for code blocks
  • directive_registry: Custom directive registry for rendering

Returns: Rendered HTML string

Example:

from patitas import parse, render

doc = parse("# Hello")
html = render(doc, source="# Hello")
print(html)  # <h1>Hello</h1>

render_llm()

Render a Patitas AST to structured plain text for LLM consumption. No HTML; explicit labels for code ([code:lang]), math ([math] ... [/math]), images ([image: alt]). Skips HtmlBlock and HtmlInline for safety. Useful for RAG retrieval, context windows, and model input.

def render_llm(doc: Document, *, source: str = "") -> str

Parameters:

  • doc: Document AST to render
  • source: Original Markdown source for FencedCode zero-copy extraction

Returns: Structured plain text string

Example:

from patitas import parse, render_llm

source = "# Hello **World**\n\n- item\n\n```python\nx = 1\n```"
doc = parse(source)
text = render_llm(doc, source=source)
# '# Hello World\n\n- item\n\n[code:python]\nx = 1\n[/code]\n\n'

See LLM Safety for the full parse → sanitize → render_llm pipeline.

extract_text()

Extract plain text from any AST node. Skips HtmlBlock and HtmlInline. Used for heading slugs, excerpts, and LLM pipelines.

def extract_text(node: Node, *, source: str = "") -> str

Parameters:

  • node: Any AST node (block or inline)
  • source: Original source (required for FencedCode zero-copy; use ""if unavailable)

Returns: Concatenated plain text from the node and its descendants

Example:

from patitas import parse, extract_text

doc = parse("# Hello **World**")
extract_text(doc.children[0])  # 'Hello World'

extract_excerpt() / extract_meta_description()

Structurally correct excerpt extraction that stops at block boundaries. Avoids mid-markdown truncation and properly handles headings, paragraphs, and lists.

def extract_excerpt(
    ast: Document | Sequence[Block],
    source: str = "",
    *,
    max_chars: int = 750,
    skip_leading_h1: bool = True,
    include_headings: bool = True,
    excerpt_as_html: bool = False,
) -> str

def extract_meta_description(
    ast: Document | Sequence[Block],
    source: str = "",
    *,
    max_chars: int = 160,
) -> str

extract_excerpt — Walks blocks in order, extracting text. Stops at block boundaries when max_chars is reached. Optional HTML output with <p>, <div class="excerpt-heading">.

extract_meta_description — SEO-friendly ~160 chars, truncated at sentence boundary.

Example:

from patitas import parse, extract_excerpt, extract_meta_description

source = "# Title\n\nFirst paragraph. Second sentence."
doc = parse(source)
extract_excerpt(doc, source)                        # Plain text
extract_excerpt(doc, source, excerpt_as_html=True) # HTML with structure
extract_meta_description(doc, source)               # ~160 chars for meta tags

sanitize()

Apply a composable sanitization policy to strip unsafe content before LLM consumption or web rendering. Policies compose via the|operator.

def sanitize(doc: Document, *, policy: Policy | Callable[[Document], Document]) -> Document

Parameters:

  • doc: Document to sanitize
  • policy: Policy instance or callable Document -> Document

Returns: Sanitized document (immutable; original unchanged)

Pre-built policies (frompatitas.sanitize):

  • llm_safe— Strip HTML, dangerous URLs (javascript:, data:, vbscript:), zero-width/bidi chars (Trojan Source mitigation). Use for LLM context.
  • web_safe — Alias for llm_safe. Same policy for web display of untrusted content.
  • strict— llm_safe + strip images (replace with alt text) + strip raw code blocks

Composable policies:strip_html, strip_html_comments, strip_dangerous_urls, normalize_unicode, strip_images, strip_raw_code. Use allow_url_schemes(*schemes)for custom URL filtering.

Example:

from patitas import parse, sanitize
from patitas.sanitize import llm_safe, strip_html, strip_dangerous_urls

doc = parse("# Title\n\n<script>alert(1)</script>\n\n[link](javascript:void(0))")
clean = sanitize(doc, policy=llm_safe)

# Custom policy
custom = strip_html | strip_dangerous_urls
clean = sanitize(doc, policy=custom)

See LLM Safety for the full pipeline.

parse_notebook()

Parse a Jupyter notebook (.ipynb) to Markdown content and metadata. Zero dependencies — uses stdlib jsononly. Supports nbformat 4 and 5.

def parse_notebook(
    content: str,
    source_path: Path | str | None = None,
) -> tuple[str, dict[str, Any]]

Parameters:

  • content: Raw JSON content of the .ipynbfile (caller handles I/O)
  • source_path: Optional path for title fallback when notebook has no title

Returns: Tuple of(markdown_content, metadata_dict)

  • markdown_content: Markdown string — markdown cells as-is, code cells as fenced blocks, outputs as HTML
  • metadata: Dict with title, type: "notebook", notebook.kernel_name, notebook.cell_count, etc.

Raises:

  • json.JSONDecodeError: If content is not valid JSON
  • ValueError: If nbformat is 3 or older

Example:

from patitas import parse_notebook

with open("demo.ipynb") as f:
    content, metadata = parse_notebook(f.read(), "demo.ipynb")

# content: Markdown string ready for parse() or render
# metadata: title, type, notebook{kernel_name, cell_count}, etc.
print(metadata["notebook"]["kernel_name"])  # e.g. "python3"

Used by Bengal for native notebook rendering — drop .ipynbinto content and build.

Markdown

High-level processor combining parsing and rendering.

class Markdown:
    def __init__(
        self,
        *,
        highlight: bool = False,
        plugins: list[str] | None = None,
        directive_registry: DirectiveRegistry | None = None,
    ) -> None: ...

    def __call__(self, source: str) -> str: ...
    def parse(self, source: str, *, source_file: str | None = None, cache: ParseCache | None = None) -> Document: ...
    def parse_many(self, sources: Iterable[str], *, source_file: str | None = None, cache: ParseCache | None = None) -> list[Document]: ...
    def render(self, doc: Document, *, source: str = "") -> str: ...

Example:

from patitas import Markdown

md = Markdown()
html = md("# Hello **World**")
print(html)  # <h1>Hello <strong>World</strong></h1>

# With plugins
md = Markdown(plugins=["table", "math", "strikethrough"])
html = md("| a | b |\n|---|---|\n| 1 | 2 |")

Parse Cache

Content-addressed cache for parsed ASTs. Key is(content_hash, config_hash); value is Document. Enables faster incremental builds (undo/revert, duplicate content) and can replace path-based snapshot caches in consumers like Bengal.

ParseCache protocol

class ParseCache(Protocol):
    def get(self, content_hash: str, config_hash: str) -> Document | None: ...
    def put(self, content_hash: str, config_hash: str, doc: Document) -> None: ...

DictParseCache

In-memory implementation. Not thread-safe — for parallel parsing, use a cache with internal locking.

from patitas import parse, DictParseCache

cache = DictParseCache()
doc = parse("# Hello", cache=cache)
# Second call with same source hits cache
doc2 = parse("# Hello", cache=cache)

hash_content() / hash_config()

Compute cache keys.hash_content(source) returns SHA256 of source. hash_config(config) returns config hash, or"" when text_transformeris set (cache bypassed).

See Performance for optimization details and Serialization for persistence patterns.

Serialization API

Convert AST nodes to/from JSON-compatible dicts and strings. Deterministic output for cache-key stability. Useful for caching parsed ASTs (Bengal incremental builds) and sending ASTs over the wire (Purr SSE).

to_dict() / from_dict()

In-memory dict format — use for caching or when you need to inspect or modify the structure before serializing to JSON.

from patitas import parse, to_dict, from_dict

doc = parse("# Hello **World**")
data = to_dict(doc)
restored = from_dict(data)
assert doc == restored

to_dict(node: Node) -> dict[str, Any]

  • node: Any AST node (Document, Heading, Paragraph, etc.)
  • Returns: JSON-compatible dict with_typediscriminator

from_dict(data: dict[str, Any]) -> Node

  • data: Dict produced by to_dict
  • Returns: Reconstructed typed AST node

to_json() / from_json()

JSON string format — use for persistence, wire transfer, or human inspection.

from patitas import parse, to_json, from_json

doc = parse("# Hello **World**")
json_str = to_json(doc)
restored = from_json(json_str)
assert doc == restored

*to_json(doc: Document, , indent: int | None = None) -> str

  • doc: Document AST root
  • indent: Optional indent for pretty-printing (None = compact)

from_json(data: str) -> Document

  • data: JSON string from to_json
  • Returns: Reconstructed Document

See Serialization for caching and wire-transfer patterns.

Configuration API

Thread-local configuration for advanced use cases.

ParseConfig

Immutable configuration dataclass.

from patitas import ParseConfig

config = ParseConfig(
    tables_enabled=True,
    math_enabled=True,
    strikethrough_enabled=False,
    task_lists_enabled=False,
    footnotes_enabled=False,
    autolinks_enabled=False,
    directive_registry=None,
    strict_contracts=False,
    text_transformer=None,
)

ParseConfig.from_dict()

Create aParseConfigfrom a dictionary. Unknown keys are silently ignored, making this safe for framework integration where config may come from YAML files or external sources.

from patitas import ParseConfig

config = ParseConfig.from_dict({
    "tables_enabled": True,
    "math_enabled": True,
    "unknown_key": "silently ignored",
})
# config.tables_enabled == True
# config.math_enabled == True

parse_config_context()

Context manager for temporary config changes.

from patitas import parse_config_context, ParseConfig, Parser

with parse_config_context(ParseConfig(tables_enabled=True)):
    parser = Parser("| a | b |")
    result = parser.parse()
# Config automatically reset after context

get/set/reset functions

from patitas import get_parse_config, set_parse_config, reset_parse_config

# Get current config
config = get_parse_config()

# Set custom config
set_parse_config(ParseConfig(math_enabled=True))

# Reset to defaults
reset_parse_config()

Low-Level API

Parser

The Markdown parser. Configuration is read from ContextVar.

from patitas import Parser, parse_config_context, ParseConfig

# Simple usage (uses default config)
parser = Parser(source, source_file="example.md")
doc = parser.parse()

# With custom config
with parse_config_context(ParseConfig(tables_enabled=True)):
    parser = Parser(source)
    doc = parser.parse()

Lexer

The state-machine lexer.

from patitas.lexer import Lexer

lexer = Lexer(source)
tokens = list(lexer)

HtmlRenderer

The HTML renderer.

from patitas.renderers.html import HtmlRenderer

renderer = HtmlRenderer(source=source)
html = renderer.render(doc)

LlmRenderer

The LLM-optimized renderer. Outputs structured plain text for model consumption.

from patitas.renderers.llm import LlmRenderer

renderer = LlmRenderer(source=source)
text = renderer.render(doc)

Extension Points

set_highlighter()

Set the global syntax highlighter. Accepts aHighlighterprotocol implementation or a simple callable(code: str, language: str) -> str.

from patitas.highlighting import set_highlighter

# Simple callable
set_highlighter(lambda code, lang: f"<pre><code class='{lang}'>{code}</code></pre>")
# Or pass None to clear
set_highlighter(None)

set_icon_resolver()

Set the global icon resolver. Takes a callable(name: str) -> str | None.

from patitas.icons import set_icon_resolver

set_icon_resolver(lambda name: f"<span class='icon-{name}'></span>")
# Or pass None to clear
set_icon_resolver(None)