Classes
Policy
3
▼
Wrapper for Document -> Document transform, supports composition via |.
Policy
3
▼
Wrapper for Document -> Document transform, supports composition via |.
Methods
Internal Methods 3 ▼
__init__
1
▼
__init__
1
▼
def __init__(self, fn: Callable[[Document], Document]) -> None
Parameters
| Name | Type | Description |
|---|---|---|
fn |
— |
__call__
1
Document
▼
__call__
1
Document
▼
def __call__(self, doc: Document) -> Document
Parameters
| Name | Type | Description |
|---|---|---|
doc |
— |
Returns
Document
__or__
1
Policy
▼
Chain policies: (self | other)(doc) applies self then other.
__or__
1
Policy
▼
def __or__(self, other: Policy) -> Policy
Parameters
| Name | Type | Description |
|---|---|---|
other |
— |
Returns
Policy
Functions
_is_dangerous_url
1
bool
▼
Check if URL uses a dangerous scheme.
_is_dangerous_url
1
bool
▼
def _is_dangerous_url(url: str) -> bool
Parameters
| Name | Type | Description |
|---|---|---|
url |
str |
Returns
bool
_scheme_allowed
2
bool
▼
Check if URL scheme is in allowed set.
_scheme_allowed
2
bool
▼
def _scheme_allowed(url: str, allowed: frozenset[str]) -> bool
Parameters
| Name | Type | Description |
|---|---|---|
url |
str |
|
allowed |
frozenset[str] |
Returns
bool
_strip_html
1
Document
▼
Remove all HtmlBlock and HtmlInline nodes.
_strip_html
1
Document
▼
def _strip_html(doc: Document) -> Document
Parameters
| Name | Type | Description |
|---|---|---|
doc |
Document |
Returns
Document
_strip_html_comments
1
Document
▼
Remove HtmlInline nodes where .html starts with <!--.
_strip_html_comments
1
Document
▼
def _strip_html_comments(doc: Document) -> Document
Parameters
| Name | Type | Description |
|---|---|---|
doc |
Document |
Returns
Document
_strip_dangerous_urls
1
Document
▼
Remove Link and Image nodes with javascript:, data:, vbscript: URLs.
_strip_dangerous_urls
1
Document
▼
def _strip_dangerous_urls(doc: Document) -> Document
Parameters
| Name | Type | Description |
|---|---|---|
doc |
Document |
Returns
Document
_normalize_unicode
1
Document
▼
Strip zero-width characters and bidi overrides from Text nodes.
_normalize_unicode
1
Document
▼
def _normalize_unicode(doc: Document) -> Document
Parameters
| Name | Type | Description |
|---|---|---|
doc |
Document |
Returns
Document
_strip_images
1
Document
▼
Replace Image nodes with Text nodes containing alt text.
_strip_images
1
Document
▼
def _strip_images(doc: Document) -> Document
Parameters
| Name | Type | Description |
|---|---|---|
doc |
Document |
Returns
Document
_strip_raw_code
1
Document
▼
Remove FencedCode and IndentedCode blocks.
_strip_raw_code
1
Document
▼
def _strip_raw_code(doc: Document) -> Document
Parameters
| Name | Type | Description |
|---|---|---|
doc |
Document |
Returns
Document
allow_url_schemes
1
Policy
▼
Keep only Link/Image nodes with allowed URL schemes.
Default schemes: https, h…
allow_url_schemes
1
Policy
▼
def allow_url_schemes(*schemes: str) -> Policy
Keep only Link/Image nodes with allowed URL schemes.
Default schemes: https, http, mailto.
Parameters
| Name | Type | Description |
|---|---|---|
*schemes |
str |
Returns
Policy
limit_depth
1
Policy
▼
Placeholder for depth limiting (prevent adversarial nesting).
Intended to remo…
limit_depth
1
Policy
▼
def limit_depth(max_depth: int = 10) -> Policy
Placeholder for depth limiting (prevent adversarial nesting).
Intended to remove blocks exceeding max_depth levels. Currently a pass-through; full implementation would track depth in transform.
Parameters
| Name | Type | Description |
|---|---|---|
max_depth |
int |
Default:10
|
Returns
Policy
sanitize
2
Document
▼
Apply a sanitization policy to a document.
sanitize
2
Document
▼
def sanitize(doc: Document, *, policy: Policy | Callable[[Document], Document]) -> Document
Parameters
| Name | Type | Description |
|---|---|---|
doc |
Document |
Document to sanitize. |
policy |
Policy | Callable[[Document], Document] |
Policy or callable Document -> Document. |
Returns
Document