Module

sanitize

Composable sanitization policies for Patitas AST.

Provides immutable transform policies for stripping unsafe content before LLM consumption or web rendering. Policies compose via the | operator.

Example:

>>> from patitas import parse, sanitize
>>> from patitas.sanitize import strip_html, strip_dangerous_urls, llm_safe
>>> doc = parse("# Hello\n\n<script>alert(1)</script>")
>>> clean = sanitize(doc, policy=llm_safe)

Classes

Policy 3
Wrapper for Document -> Document transform, supports composition via |.

Wrapper for Document -> Document transform, supports composition via |.

Methods

Internal Methods 3
__init__ 1
def __init__(self, fn: Callable[[Document], Document]) -> None
Parameters
Name Type Description
fn
__call__ 1 Document
def __call__(self, doc: Document) -> Document
Parameters
Name Type Description
doc
Returns
Document
__or__ 1 Policy
Chain policies: (self | other)(doc) applies self then other.
def __or__(self, other: Policy) -> Policy
Parameters
Name Type Description
other
Returns
Policy

Functions

_is_dangerous_url 1 bool
Check if URL uses a dangerous scheme.
def _is_dangerous_url(url: str) -> bool
Parameters
Name Type Description
url str
Returns
bool
_scheme_allowed 2 bool
Check if URL scheme is in allowed set.
def _scheme_allowed(url: str, allowed: frozenset[str]) -> bool
Parameters
Name Type Description
url str
allowed frozenset[str]
Returns
bool
_strip_html 1 Document
Remove all HtmlBlock and HtmlInline nodes.
def _strip_html(doc: Document) -> Document
Parameters
Name Type Description
doc Document
Returns
Document
_strip_html_comments 1 Document
Remove HtmlInline nodes where .html starts with <!--.
def _strip_html_comments(doc: Document) -> Document
Parameters
Name Type Description
doc Document
Returns
Document
_strip_dangerous_urls 1 Document
Remove Link and Image nodes with javascript:, data:, vbscript: URLs.
def _strip_dangerous_urls(doc: Document) -> Document
Parameters
Name Type Description
doc Document
Returns
Document
_normalize_unicode 1 Document
Strip zero-width characters and bidi overrides from Text nodes.
def _normalize_unicode(doc: Document) -> Document
Parameters
Name Type Description
doc Document
Returns
Document
_strip_images 1 Document
Replace Image nodes with Text nodes containing alt text.
def _strip_images(doc: Document) -> Document
Parameters
Name Type Description
doc Document
Returns
Document
_strip_raw_code 1 Document
Remove FencedCode and IndentedCode blocks.
def _strip_raw_code(doc: Document) -> Document
Parameters
Name Type Description
doc Document
Returns
Document
allow_url_schemes 1 Policy
Keep only Link/Image nodes with allowed URL schemes. Default schemes: https, h…
def allow_url_schemes(*schemes: str) -> Policy

Keep only Link/Image nodes with allowed URL schemes.

Default schemes: https, http, mailto.

Parameters
Name Type Description
*schemes str
Returns
Policy
limit_depth 1 Policy
Placeholder for depth limiting (prevent adversarial nesting). Intended to remo…
def limit_depth(max_depth: int = 10) -> Policy

Placeholder for depth limiting (prevent adversarial nesting).

Intended to remove blocks exceeding max_depth levels. Currently a pass-through; full implementation would track depth in transform.

Parameters
Name Type Description
max_depth int Default:10
Returns
Policy
sanitize 2 Document
Apply a sanitization policy to a document.
def sanitize(doc: Document, *, policy: Policy | Callable[[Document], Document]) -> Document
Parameters
Name Type Description
doc Document

Document to sanitize.

policy Policy | Callable[[Document], Document]

Policy or callable Document -> Document.

Returns
Document