AST Processing - Patitas

Patitas provides two tools for working with the AST after parsing: BaseVisitor for reading the tree and transform()for rewriting it.

BaseVisitor

A visitor walks the AST and dispatches to typedvisit_*methods. Override the methods for node types you care about — everything else falls through tovisit_default.

Collect All Headings

from patitas import parse, BaseVisitor, Heading

class HeadingCollector(BaseVisitor[None]):
    def __init__(self) -> None:
        self.headings: list[Heading] = []

    def visit_heading(self, node: Heading) -> None:
        self.headings.append(node)

doc = parse(source)
collector = HeadingCollector()
collector.visit(doc)

for h in collector.headings:
    print(f"{'#' * h.level} (line {h.location.lineno})")

Extract All Links

from patitas import parse, BaseVisitor, Link

class LinkExtractor(BaseVisitor[None]):
    def __init__(self) -> None:
        self.urls: list[str] = []

    def visit_link(self, node: Link) -> None:
        self.urls.append(node.url)

doc = parse(source)
extractor = LinkExtractor()
extractor.visit(doc)

for url in extractor.urls:
    print(url)

Count All Nodes

from patitas import parse, BaseVisitor

class NodeCounter(BaseVisitor[None]):
    def __init__(self) -> None:
        self.count: int = 0

    def visit_default(self, node) -> None:
        self.count += 1

doc = parse(source)
counter = NodeCounter()
counter.visit(doc)
print(f"{counter.count} nodes in the AST")

How Dispatch Works

BaseVisitor uses matchto dispatch to the correct method:

Callvisit(node)on any node
The visitor matches the node type and callsvisit_heading, visit_paragraph, etc.
Children are walked automatically after the visit method returns
Unmatched types callvisit_default

Available visit methods (one per node type):

Block nodes	Inline nodes
`visit_document`	`visit_text`
`visit_heading`	`visit_emphasis`
`visit_paragraph`	`visit_strong`
`visit_fenced_code`	`visit_strikethrough`
`visit_indented_code`	`visit_link`
`visit_block_quote`	`visit_image`
`visit_list`	`visit_code_span`
`visit_list_item`	`visit_line_break`
`visit_thematic_break`	`visit_soft_break`
`visit_html_block`	`visit_html_inline`
`visit_directive`	`visit_role`
`visit_table`	`visit_math`
`visit_table_row`	`visit_footnote_ref`
`visit_table_cell`
`visit_math_block`
`visit_footnote_def`

Thread Safety

Visitors accumulate mutable state, so create a new instance per thread:

from concurrent.futures import ThreadPoolExecutor

def count_headings(source: str) -> int:
    doc = parse(source)
    collector = HeadingCollector()  # New instance per thread
    collector.visit(doc)
    return len(collector.headings)

with ThreadPoolExecutor(max_workers=4) as pool:
    counts = list(pool.map(count_headings, sources))

transform()

transform()rewrites the AST immutably. It applies a function to every node bottom-up (children first, then parent) and returns a new tree. The original is untouched.

def transform(doc: Document, fn: Callable[[Node], Node]) -> Document

Parameters:

doc: The Document to transform
fn: Function that receives a node and returns a (possibly new) node. Return the same node to keep it unchanged.

Returns: A new Document with the transformation applied.

Shift Heading Levels

Demote all headings by one level (useful when embedding a document as a subsection):

import dataclasses
from patitas import parse, transform, Heading
from patitas.nodes import Node

def shift_headings(node: Node) -> Node:
    if isinstance(node, Heading):
        return dataclasses.replace(node, level=min(node.level + 1, 6))
    return node

doc = parse("# Title\n## Section")
new_doc = transform(doc, shift_headings)
# Title is now level 2, Section is now level 3

Rewrite Links

Convert relative links to absolute:

import dataclasses
from patitas import parse, transform, Link
from patitas.nodes import Node

BASE_URL = "https://example.com"

def absolutize_links(node: Node) -> Node:
    if isinstance(node, Link) and node.url.startswith("/"):
        return dataclasses.replace(node, url=BASE_URL + node.url)
    return node

doc = parse("[About](/about)")
new_doc = transform(doc, absolutize_links)

How It Works

The function walks the tree bottom-up
Children are transformed first, then the parent
If a node is unchanged (fnreturns the same object), no allocation occurs
If children are all unchanged, the parent keeps its original tuple
Only modified paths allocate new nodes viadataclasses.replace()

This means an identity transform (returning every node unchanged) is fast — it walks the tree but allocates nothing.

Thread Safety

transform()is a pure function. The input tree is never modified, and the output is a new frozen tree. Safe to call from any thread.

Combining Visitor and Transform

A common pattern: use a visitor to analyze the tree, then a transform to rewrite it based on what you found.

from patitas import parse, render, BaseVisitor, transform, Heading
from patitas.nodes import Node

# Step 1: Find the deepest heading level
class MaxLevel(BaseVisitor[None]):
    def __init__(self) -> None:
        self.max: int = 0

    def visit_heading(self, node: Heading) -> None:
        self.max = max(self.max, node.level)

doc = parse(source)

finder = MaxLevel()
finder.visit(doc)

# Step 2: Normalize all headings to start at level 1
offset = finder.max - 1

def normalize(node: Node) -> Node:
    if isinstance(node, Heading) and node.level > 1:
        import dataclasses
        return dataclasses.replace(node, level=max(1, node.level - offset))
    return node

normalized = transform(doc, normalize)
html = render(normalized, source=source)

Performance

Both operations are fast relative to parsing:

Visitor: ~0.7 us/node — traversal with match dispatch
Identity transform: ~0.4 ms for a 1000-node tree (no allocations)
Rewriting transform: ~0.5 ms for a 1000-node tree (selective allocations)

Parse once, then visit and transform cheaply on the frozen AST.