API Reference - Patitas

Core API for parsing and rendering Markdown.

Public API Boundary

The stable integration surface is the top-levelpatitaspackage. Prefer imports like:

from patitas import Markdown, parse, render, Document, Heading

Names exported from patitas.__init__are treated as downstream contracts. That includes the parser facade, AST node types, source locations, parse cache helpers, visitor/diff helpers, serialization helpers, sanitization helpers, directive registry builders, and renderer protocols documented on this page.

Subpackages such aspatitas.lexer, patitas.parser, patitas.parsing, patitas.renderers.html, and implementation modules under patitas.directives, patitas.roles, or patitas.pluginsmay still be useful for advanced users, but they are not the preferred compatibility boundary before 1.0. If an integration needs an internal module, open an issue so the missing public contract can be promoted intentionally.

High-Level API

parse()

Parse Markdown source into a typed AST.

def parse(
    source: str,
    *,
    source_file: str | None = None,
    directive_registry: DirectiveRegistry | None = None,
    cache: ParseCache | None = None,
) -> Document

Parameters:

source: Markdown source text
source_file: Optional source file path for error messages
directive_registry: Custom directive registry (uses defaults if None)
cache: Optional content-addressed parse cache (see Parse Cache)

Returns: Document AST root node

Example:

from patitas import parse

doc = parse("# Hello **World**")
print(doc.children[0])  # Heading(level=1, ...)

# With parse cache (faster incremental builds)
from patitas import parse, DictParseCache
cache = DictParseCache()
doc1 = parse("# Hello", cache=cache)
doc2 = parse("# Hello", cache=cache)  # Cache hit, no re-parse

render()

Render a Patitas AST to HTML.

def render(
    doc: Document,
    *,
    source: str = "",
    highlight: bool = False,
    directive_registry: DirectiveRegistry | None = None,
) -> str

Parameters:

doc: Document AST to render
source: Original Markdown source for zero-copy extraction
highlight: Enable syntax highlighting for code blocks
directive_registry: Custom directive registry for rendering

Returns: Rendered HTML string

Example:

from patitas import parse, render

doc = parse("# Hello")
html = render(doc, source="# Hello")
print(html)  # <h1>Hello</h1>

render_llm()

Render a Patitas AST to structured plain text for LLM consumption. No HTML; explicit labels for code ([code:lang]), math ([math] ... [/math]), images ([image: alt]). Skips HtmlBlock and HtmlInline for safety. Useful for RAG retrieval, context windows, and model input.

def render_llm(doc: Document, *, source: str = "") -> str

Parameters:

doc: Document AST to render
source: Original Markdown source for FencedCode zero-copy extraction

Returns: Structured plain text string

Example:

from patitas import parse, render_llm

source = "# Hello **World**\n\n- item\n\n```python\nx = 1\n```"
doc = parse(source)
text = render_llm(doc, source=source)
# '# Hello World\n\n- item\n\n[code:python]\nx = 1\n[/code]\n\n'

See LLM Safety for the full parse → sanitize → render_llm pipeline.

extract_text()

Extract plain text from any AST node. Skips HtmlBlock and HtmlInline. Used for heading slugs, excerpts, and LLM pipelines.

def extract_text(node: Node, *, source: str = "") -> str

Parameters:

node: Any AST node (block or inline)
source: Original source (required for FencedCode zero-copy; use ""if unavailable)

Returns: Concatenated plain text from the node and its descendants

Example:

from patitas import parse, extract_text

doc = parse("# Hello **World**")
extract_text(doc.children[0])  # 'Hello World'

extract_excerpt() / extract_meta_description()

Structurally correct excerpt extraction that stops at block boundaries. Avoids mid-markdown truncation and properly handles headings, paragraphs, and lists.

def extract_excerpt(
    ast: Document | Sequence[Block],
    source: str = "",
    *,
    max_chars: int = 750,
    skip_leading_h1: bool = True,
    include_headings: bool = True,
    excerpt_as_html: bool = False,
) -> str

def extract_meta_description(
    ast: Document | Sequence[Block],
    source: str = "",
    *,
    max_chars: int = 160,
) -> str

extract_excerpt — Walks blocks in order, extracting text. Stops at block boundaries when max_chars is reached. Optional HTML output with <p>, <div class="excerpt-heading">.

extract_meta_description — SEO-friendly ~160 chars, truncated at sentence boundary.

Example:

from patitas import parse, extract_excerpt, extract_meta_description

source = "# Title\n\nFirst paragraph. Second sentence."
doc = parse(source)
extract_excerpt(doc, source)                        # Plain text
extract_excerpt(doc, source, excerpt_as_html=True) # HTML with structure
extract_meta_description(doc, source)               # ~160 chars for meta tags

sanitize()

Apply a composable sanitization policy to strip unsafe content before LLM consumption or web rendering. Policies compose via the|operator.

def sanitize(doc: Document, *, policy: Policy | Callable[[Document], Document]) -> Document

Parameters:

doc: Document to sanitize
policy: Policy instance or callable Document -> Document

Returns: Sanitized document (immutable; original unchanged)

Pre-built policies (frompatitas.sanitize):

llm_safe— Strip HTML, dangerous URLs (javascript:, data:, vbscript:), zero-width/bidi chars (Trojan Source mitigation). Use for LLM context.
web_safe — Alias for llm_safe. Same policy for web display of untrusted content.
strict— llm_safe + strip images (replace with alt text) + strip raw code blocks

Composable policies:strip_html, strip_html_comments, strip_dangerous_urls, normalize_unicode, strip_images, strip_raw_code. Use allow_url_schemes(*schemes)for custom URL filtering.

Example:

from patitas import parse, sanitize
from patitas.sanitize import llm_safe, strip_html, strip_dangerous_urls

doc = parse("# Title\n\n<script>alert(1)</script>\n\n[link](javascript:void(0))")
clean = sanitize(doc, policy=llm_safe)

# Custom policy
custom = strip_html | strip_dangerous_urls
clean = sanitize(doc, policy=custom)

See LLM Safety for the full pipeline.

parse_notebook()

Parse a Jupyter notebook (.ipynb) to Markdown content and metadata. Zero dependencies — uses stdlib jsononly. Supports nbformat 4 and 5.

def parse_notebook(
    content: str,
    source_path: Path | str | None = None,
) -> tuple[str, dict[str, Any]]

Parameters:

content: Raw JSON content of the .ipynbfile (caller handles I/O)
source_path: Optional path for title fallback when notebook has no title

Returns: Tuple of(markdown_content, metadata_dict)

markdown_content: Markdown string — markdown cells as-is, code cells as fenced blocks, outputs as HTML
metadata: Dict with title, type: "notebook", notebook.kernel_name, notebook.cell_count, etc.

Raises:

json.JSONDecodeError: If content is not valid JSON
ValueError: If nbformat is 3 or older

Example:

from patitas import parse_notebook

with open("demo.ipynb") as f:
    content, metadata = parse_notebook(f.read(), "demo.ipynb")

# content: Markdown string ready for parse() or render
# metadata: title, type, notebook{kernel_name, cell_count}, etc.
print(metadata["notebook"]["kernel_name"])  # e.g. "python3"

Used by Bengal for native notebook rendering — drop .ipynbinto content and build.

Markdown

High-level processor combining parsing and rendering.

class Markdown:
    def __init__(
        self,
        *,
        highlight: bool = False,
        plugins: list[str] | None = None,
        directive_registry: DirectiveRegistry | None = None,
    ) -> None: ...

    def __call__(self, source: str) -> str: ...
    def parse(self, source: str, *, source_file: str | None = None, cache: ParseCache | None = None) -> Document: ...
    def parse_many(self, sources: Iterable[str], *, source_file: str | None = None, cache: ParseCache | None = None) -> list[Document]: ...
    def render(self, doc: Document, *, source: str = "") -> str: ...

Example:

from patitas import Markdown

md = Markdown()
html = md("# Hello **World**")
print(html)  # <h1>Hello <strong>World</strong></h1>

# With plugins
md = Markdown(plugins=["table", "math", "strikethrough"])
html = md("| a | b |\n|---|---|\n| 1 | 2 |")

Parse Cache

Content-addressed cache for parsed ASTs. Key is(content_hash, config_hash); value is Document. Enables faster incremental builds (undo/revert, duplicate content) and can replace path-based snapshot caches in consumers like Bengal.

ParseCache protocol

class ParseCache(Protocol):
    def get(self, content_hash: str, config_hash: str) -> Document | None: ...
    def put(self, content_hash: str, config_hash: str, doc: Document) -> None: ...

DictParseCache

In-memory implementation. Not thread-safe — for parallel parsing, use a cache with internal locking.

from patitas import parse, DictParseCache

cache = DictParseCache()
doc = parse("# Hello", cache=cache)
# Second call with same source hits cache
doc2 = parse("# Hello", cache=cache)

hash_content() / hash_config()

Compute cache keys.hash_content(source) returns SHA256 of source. hash_config(config) returns config hash, or"" when text_transformeris set (cache bypassed).

See Performance for optimization details and Serialization for persistence patterns.

Serialization API

Convert AST nodes to/from JSON-compatible dicts and strings. Deterministic output for cache-key stability. Useful for caching parsed ASTs (Bengal incremental builds) and sending ASTs over the wire (Purr SSE).

to_dict() / from_dict()

In-memory dict format — use for caching or when you need to inspect or modify the structure before serializing to JSON.

from patitas import parse, to_dict, from_dict

doc = parse("# Hello **World**")
data = to_dict(doc)
restored = from_dict(data)
assert doc == restored

to_dict(node: Node) -> dict[str, Any]

node: Any AST node (Document, Heading, Paragraph, etc.)
Returns: JSON-compatible dict with_typediscriminator

from_dict(data: dict[str, Any]) -> Node

data: Dict produced by to_dict
Returns: Reconstructed typed AST node

to_json() / from_json()

JSON string format — use for persistence, wire transfer, or human inspection.

from patitas import parse, to_json, from_json

doc = parse("# Hello **World**")
json_str = to_json(doc)
restored = from_json(json_str)
assert doc == restored

*to_json(doc: Document, , indent: int | None = None) -> str

doc: Document AST root
indent: Optional indent for pretty-printing (None = compact)

from_json(data: str) -> Document

data: JSON string from to_json
Returns: Reconstructed Document

See Serialization for caching and wire-transfer patterns.

Configuration API

Thread-local configuration for advanced use cases.

ParseConfig

Immutable configuration dataclass.

from patitas import ParseConfig

config = ParseConfig(
    tables_enabled=True,
    math_enabled=True,
    strikethrough_enabled=False,
    task_lists_enabled=False,
    footnotes_enabled=False,
    autolinks_enabled=False,
    directive_registry=None,
    strict_contracts=False,
    text_transformer=None,
)

ParseConfig.from_dict()

Create aParseConfigfrom a dictionary. Unknown keys are silently ignored, making this safe for framework integration where config may come from YAML files or external sources.

from patitas import ParseConfig

config = ParseConfig.from_dict({
    "tables_enabled": True,
    "math_enabled": True,
    "unknown_key": "silently ignored",
})
# config.tables_enabled == True
# config.math_enabled == True

parse_config_context()

Context manager for temporary config changes.

from patitas import parse_config_context, ParseConfig, Parser

with parse_config_context(ParseConfig(tables_enabled=True)):
    parser = Parser("| a | b |")
    result = parser.parse()
# Config automatically reset after context

get/set/reset functions

from patitas import get_parse_config, set_parse_config, reset_parse_config

# Get current config
config = get_parse_config()

# Set custom config
set_parse_config(ParseConfig(math_enabled=True))

# Reset to defaults
reset_parse_config()

Low-Level API

Parser

The Markdown parser. Configuration is read from ContextVar.

from patitas import Parser, parse_config_context, ParseConfig

# Simple usage (uses default config)
parser = Parser(source, source_file="example.md")
doc = parser.parse()

# With custom config
with parse_config_context(ParseConfig(tables_enabled=True)):
    parser = Parser(source)
    doc = parser.parse()

Lexer

The state-machine lexer.

from patitas.lexer import Lexer

lexer = Lexer(source)
tokens = list(lexer)

HtmlRenderer

The HTML renderer.

from patitas.renderers.html import HtmlRenderer

renderer = HtmlRenderer(source=source)
html = renderer.render(doc)

LlmRenderer

The LLM-optimized renderer. Outputs structured plain text for model consumption.

from patitas.renderers.llm import LlmRenderer

renderer = LlmRenderer(source=source)
text = renderer.render(doc)

Extension Points

set_highlighter()

Set the global syntax highlighter. Accepts aHighlighterprotocol implementation or a simple callable(code: str, language: str) -> str.

from patitas.highlighting import set_highlighter

# Simple callable
set_highlighter(lambda code, lang: f"<pre><code class='{lang}'>{code}</code></pre>")
# Or pass None to clear
set_highlighter(None)

set_icon_resolver()

Set the global icon resolver. Takes a callable(name: str) -> str | None.

from patitas.icons import set_icon_resolver

set_icon_resolver(lambda name: f"<span class='icon-{name}'></span>")
# Or pass None to clear
set_icon_resolver(None)