Core API for parsing and rendering Markdown.
High-Level API
parse()
Parse Markdown source into a typed AST.
def parse(
source: str,
*,
source_file: str | None = None,
directive_registry: DirectiveRegistry | None = None,
cache: ParseCache | None = None,
) -> Document
Parameters:
source: Markdown source textsource_file: Optional source file path for error messagesdirective_registry: Custom directive registry (uses defaults if None)cache: Optional content-addressed parse cache (see Parse Cache)
Returns: Document AST root node
Example:
from patitas import parse
doc = parse("# Hello **World**")
print(doc.children[0]) # Heading(level=1, ...)
# With parse cache (faster incremental builds)
from patitas import parse, DictParseCache
cache = DictParseCache()
doc1 = parse("# Hello", cache=cache)
doc2 = parse("# Hello", cache=cache) # Cache hit, no re-parse
render()
Render a Patitas AST to HTML.
def render(
doc: Document,
*,
source: str = "",
highlight: bool = False,
directive_registry: DirectiveRegistry | None = None,
) -> str
Parameters:
doc: Document AST to rendersource: Original Markdown source for zero-copy extractionhighlight: Enable syntax highlighting for code blocksdirective_registry: Custom directive registry for rendering
Returns: Rendered HTML string
Example:
from patitas import parse, render
doc = parse("# Hello")
html = render(doc, source="# Hello")
print(html) # <h1>Hello</h1>
render_llm()
Render a Patitas AST to structured plain text for LLM consumption. No HTML; explicit
labels for code ([code:lang]), math ([math] ... [/math]), images ([image: alt]).
Skips HtmlBlock and HtmlInline for safety. Useful for RAG retrieval, context windows,
and model input.
def render_llm(doc: Document, *, source: str = "") -> str
Parameters:
doc: Document AST to rendersource: Original Markdown source for FencedCode zero-copy extraction
Returns: Structured plain text string
Example:
from patitas import parse, render_llm
source = "# Hello **World**\n\n- item\n\n```python\nx = 1\n```"
doc = parse(source)
text = render_llm(doc, source=source)
# '# Hello World\n\n- item\n\n[code:python]\nx = 1\n[/code]\n\n'
See LLM Safety for the full parse → sanitize → render_llm pipeline.
extract_text()
Extract plain text from any AST node. Skips HtmlBlock and HtmlInline. Used for heading slugs, excerpts, and LLM pipelines.
def extract_text(node: Node, *, source: str = "") -> str
Parameters:
node: Any AST node (block or inline)source: Original source (required for FencedCode zero-copy; use""if unavailable)
Returns: Concatenated plain text from the node and its descendants
Example:
from patitas import parse, extract_text
doc = parse("# Hello **World**")
extract_text(doc.children[0]) # 'Hello World'
extract_excerpt() / extract_meta_description()
Structurally correct excerpt extraction that stops at block boundaries. Avoids mid-markdown truncation and properly handles headings, paragraphs, and lists.
def extract_excerpt(
ast: Document | Sequence[Block],
source: str = "",
*,
max_chars: int = 750,
skip_leading_h1: bool = True,
include_headings: bool = True,
excerpt_as_html: bool = False,
) -> str
def extract_meta_description(
ast: Document | Sequence[Block],
source: str = "",
*,
max_chars: int = 160,
) -> str
extract_excerpt — Walks blocks in order, extracting text. Stops at block boundaries when
max_chars is reached. Optional HTML output with <p>, <div class="excerpt-heading">.
extract_meta_description — SEO-friendly ~160 chars, truncated at sentence boundary.
Example:
from patitas import parse, extract_excerpt, extract_meta_description
source = "# Title\n\nFirst paragraph. Second sentence."
doc = parse(source)
extract_excerpt(doc, source) # Plain text
extract_excerpt(doc, source, excerpt_as_html=True) # HTML with structure
extract_meta_description(doc, source) # ~160 chars for meta tags
sanitize()
Apply a composable sanitization policy to strip unsafe content before LLM consumption or
web rendering. Policies compose via the|operator.
def sanitize(doc: Document, *, policy: Policy | Callable[[Document], Document]) -> Document
Parameters:
doc: Document to sanitizepolicy:Policyinstance or callableDocument -> Document
Returns: Sanitized document (immutable; original unchanged)
Pre-built policies (frompatitas.sanitize):
llm_safe— Strip HTML, dangerous URLs (javascript:, data:, vbscript:), zero-width/bidi chars (Trojan Source mitigation). Use for LLM context.web_safe— Alias forllm_safe. Same policy for web display of untrusted content.strict— llm_safe + strip images (replace with alt text) + strip raw code blocks
Composable policies:strip_html, strip_html_comments, strip_dangerous_urls,
normalize_unicode, strip_images, strip_raw_code. Use allow_url_schemes(*schemes)for
custom URL filtering.
Example:
from patitas import parse, sanitize
from patitas.sanitize import llm_safe, strip_html, strip_dangerous_urls
doc = parse("# Title\n\n<script>alert(1)</script>\n\n[link](javascript:void(0))")
clean = sanitize(doc, policy=llm_safe)
# Custom policy
custom = strip_html | strip_dangerous_urls
clean = sanitize(doc, policy=custom)
See LLM Safety for the full pipeline.
parse_notebook()
Parse a Jupyter notebook (.ipynb) to Markdown content and metadata. Zero dependencies — uses stdlib jsononly. Supports nbformat 4 and 5.
def parse_notebook(
content: str,
source_path: Path | str | None = None,
) -> tuple[str, dict[str, Any]]
Parameters:
content: Raw JSON content of the.ipynbfile (caller handles I/O)source_path: Optional path for title fallback when notebook has no title
Returns: Tuple of(markdown_content, metadata_dict)
markdown_content: Markdown string — markdown cells as-is, code cells as fenced blocks, outputs as HTMLmetadata: Dict withtitle,type: "notebook",notebook.kernel_name,notebook.cell_count, etc.
Raises:
json.JSONDecodeError: If content is not valid JSONValueError: If nbformat is 3 or older
Example:
from patitas import parse_notebook
with open("demo.ipynb") as f:
content, metadata = parse_notebook(f.read(), "demo.ipynb")
# content: Markdown string ready for parse() or render
# metadata: title, type, notebook{kernel_name, cell_count}, etc.
print(metadata["notebook"]["kernel_name"]) # e.g. "python3"
Used by Bengal for native notebook rendering — drop .ipynbinto content and build.
Markdown
High-level processor combining parsing and rendering.
class Markdown:
def __init__(
self,
*,
highlight: bool = False,
plugins: list[str] | None = None,
directive_registry: DirectiveRegistry | None = None,
) -> None: ...
def __call__(self, source: str) -> str: ...
def parse(self, source: str, *, source_file: str | None = None, cache: ParseCache | None = None) -> Document: ...
def parse_many(self, sources: Iterable[str], *, source_file: str | None = None, cache: ParseCache | None = None) -> list[Document]: ...
def render(self, doc: Document, *, source: str = "") -> str: ...
Example:
from patitas import Markdown
md = Markdown()
html = md("# Hello **World**")
print(html) # <h1>Hello <strong>World</strong></h1>
# With plugins
md = Markdown(plugins=["table", "math", "strikethrough"])
html = md("| a | b |\n|---|---|\n| 1 | 2 |")
Parse Cache
Content-addressed cache for parsed ASTs. Key is(content_hash, config_hash); value is
Document. Enables faster incremental builds (undo/revert, duplicate content) and can
replace path-based snapshot caches in consumers like Bengal.
ParseCache protocol
class ParseCache(Protocol):
def get(self, content_hash: str, config_hash: str) -> Document | None: ...
def put(self, content_hash: str, config_hash: str, doc: Document) -> None: ...
DictParseCache
In-memory implementation. Not thread-safe — for parallel parsing, use a cache with internal locking.
from patitas import parse, DictParseCache
cache = DictParseCache()
doc = parse("# Hello", cache=cache)
# Second call with same source hits cache
doc2 = parse("# Hello", cache=cache)
hash_content() / hash_config()
Compute cache keys.hash_content(source) returns SHA256 of source. hash_config(config)
returns config hash, or"" when text_transformeris set (cache bypassed).
See Performance for optimization details and Serialization for persistence patterns.
Serialization API
Convert AST nodes to/from JSON-compatible dicts and strings. Deterministic output for cache-key stability. Useful for caching parsed ASTs (Bengal incremental builds) and sending ASTs over the wire (Purr SSE).
to_dict() / from_dict()
In-memory dict format — use for caching or when you need to inspect or modify the structure before serializing to JSON.
from patitas import parse, to_dict, from_dict
doc = parse("# Hello **World**")
data = to_dict(doc)
restored = from_dict(data)
assert doc == restored
to_dict(node: Node) -> dict[str, Any]
node: Any AST node (Document, Heading, Paragraph, etc.)- Returns: JSON-compatible dict with
_typediscriminator
from_dict(data: dict[str, Any]) -> Node
data: Dict produced byto_dict- Returns: Reconstructed typed AST node
to_json() / from_json()
JSON string format — use for persistence, wire transfer, or human inspection.
from patitas import parse, to_json, from_json
doc = parse("# Hello **World**")
json_str = to_json(doc)
restored = from_json(json_str)
assert doc == restored
*to_json(doc: Document, , indent: int | None = None) -> str
doc: Document AST rootindent: Optional indent for pretty-printing (None = compact)
from_json(data: str) -> Document
data: JSON string fromto_json- Returns: Reconstructed Document
See Serialization for caching and wire-transfer patterns.
Configuration API
Thread-local configuration for advanced use cases.
ParseConfig
Immutable configuration dataclass.
from patitas import ParseConfig
config = ParseConfig(
tables_enabled=True,
math_enabled=True,
strikethrough_enabled=False,
task_lists_enabled=False,
footnotes_enabled=False,
autolinks_enabled=False,
directive_registry=None,
strict_contracts=False,
text_transformer=None,
)
ParseConfig.from_dict()
Create aParseConfigfrom a dictionary. Unknown keys are silently ignored,
making this safe for framework integration where config may come from YAML
files or external sources.
from patitas import ParseConfig
config = ParseConfig.from_dict({
"tables_enabled": True,
"math_enabled": True,
"unknown_key": "silently ignored",
})
# config.tables_enabled == True
# config.math_enabled == True
parse_config_context()
Context manager for temporary config changes.
from patitas import parse_config_context, ParseConfig, Parser
with parse_config_context(ParseConfig(tables_enabled=True)):
parser = Parser("| a | b |")
result = parser.parse()
# Config automatically reset after context
get/set/reset functions
from patitas import get_parse_config, set_parse_config, reset_parse_config
# Get current config
config = get_parse_config()
# Set custom config
set_parse_config(ParseConfig(math_enabled=True))
# Reset to defaults
reset_parse_config()
Low-Level API
Parser
The Markdown parser. Configuration is read from ContextVar.
from patitas import Parser, parse_config_context, ParseConfig
# Simple usage (uses default config)
parser = Parser(source, source_file="example.md")
doc = parser.parse()
# With custom config
with parse_config_context(ParseConfig(tables_enabled=True)):
parser = Parser(source)
doc = parser.parse()
Lexer
The state-machine lexer.
from patitas.lexer import Lexer
lexer = Lexer(source)
tokens = list(lexer)
HtmlRenderer
The HTML renderer.
from patitas.renderers.html import HtmlRenderer
renderer = HtmlRenderer(source=source)
html = renderer.render(doc)
LlmRenderer
The LLM-optimized renderer. Outputs structured plain text for model consumption.
from patitas.renderers.llm import LlmRenderer
renderer = LlmRenderer(source=source)
text = renderer.render(doc)
Extension Points
set_highlighter()
Set the global syntax highlighter. Accepts aHighlighterprotocol implementation or a
simple callable(code: str, language: str) -> str.
from patitas.highlighting import set_highlighter
# Simple callable
set_highlighter(lambda code, lang: f"<pre><code class='{lang}'>{code}</code></pre>")
# Or pass None to clear
set_highlighter(None)
set_icon_resolver()
Set the global icon resolver. Takes a callable(name: str) -> str | None.
from patitas.icons import set_icon_resolver
set_icon_resolver(lambda name: f"<span class='icon-{name}'></span>")
# Or pass None to clear
set_icon_resolver(None)