Module

excerpt

Extract plain-text excerpts from Patitas AST.

Provides structurally correct excerpt extraction that stops at block boundaries, avoiding mid-markdown truncation and properly handling headings, paragraphs, and lists.

Example:

>>> from patitas import parse, extract_excerpt, extract_meta_description
>>> doc = parse("# Title\n\nFirst paragraph. Second sentence.")
>>> extract_excerpt(doc)
'First paragraph. Second sentence.'
>>> extract_meta_description(doc)
'First paragraph.'

Functions

_inline_text 1 str
Recursively extract plain text from inline nodes.
def _inline_text(node: Inline) -> str
Parameters
Name Type Description
node Inline
Returns
str
_inline_text_html 1 str
Recursively extract HTML from inline nodes (preserves strong, emphasis, links).
def _inline_text_html(node: Inline) -> str

Recursively extract HTML from inline nodes (preserves strong, emphasis, links).

Parameters
Name Type Description
node Inline
Returns
str
_block_text 2 str
Extract plain text from a block node.
def _block_text(node: Block, source: str) -> str
Parameters
Name Type Description
node Block
source str
Returns
str
_block_text_html 2 str
Extract HTML from a block node (preserves inline formatting, uses block element…
def _block_text_html(node: Block, source: str) -> str

Extract HTML from a block node (preserves inline formatting, uses block elements).

Parameters
Name Type Description
node Block
source str
Returns
str
extract_excerpt 6 str
Extract excerpt from AST. Stops at block boundaries. Walks blocks in order, ex…
def extract_excerpt(ast: Document | Sequence[Block], source: str = '', *, max_chars: int = 750, skip_leading_h1: bool = True, include_headings: bool = True, excerpt_as_html: bool = False) -> str

Extract excerpt from AST. Stops at block boundaries.

Walks blocks in order, extracting text. Skips leading h1 by default. Stops when accumulated text reaches max_chars, always at a block boundary.

Parameters
Name Type Description
ast Document | Sequence[Block]

Document or sequence of Block nodes

source str

Original source (for FencedCode zero-copy extraction)

Default:''
max_chars int

Maximum characters (default 250)

Default:750
skip_leading_h1 bool

Skip first Heading(level=1) (default True)

Default:True
include_headings bool

Include heading text in excerpt (default True)

Default:True
excerpt_as_html bool

If True, output block elements (

,

) for structure, preserving , , (default False)

Default: False
Returns
str
_truncate_at_word 3 str
Truncate at word boundary within length.
def _truncate_at_word(text: str, length: int, suffix: str = '...') -> str
Parameters
Name Type Description
text str
length int
suffix str Default:'...'
Returns
str
_truncate_at_sentence 3 str
Truncate at sentence boundary. Falls back to word boundary if needed.
def _truncate_at_sentence(text: str, length: int = 160, min_ratio: float = 0.6) -> str
Parameters
Name Type Description
text str
length int Default:160
min_ratio float Default:0.6
Returns
str
extract_meta_description 3 str
Extract SEO-friendly meta description from AST. Same logic as extract_excerpt …
def extract_meta_description(ast: Document | Sequence[Block], source: str = '', *, max_chars: int = 160) -> str

Extract SEO-friendly meta description from AST.

Same logic as extract_excerpt but prefers sentence boundary at 160 chars. Uses extract_excerpt with larger buffer then truncates at sentence.

Parameters
Name Type Description
ast Document | Sequence[Block]

Document or sequence of Block nodes

source str

Original source (for FencedCode)

Default:''
max_chars int

Maximum length (default 160, SEO standard)

Default:160
Returns
str