Classes
PageContentMixin
Mixin providing AST-based content properties for pages.
This mixin handles content representation …
PageContentMixin
Mixin providing AST-based content properties for pages.
This mixin handles content representation across multiple formats:
- AST (Abstract Syntax Tree) - structural representation (Phase 3)
- HTML - rendered for display
- Plain text - for search indexing and LLM
All properties use lazy evaluation with caching for performance.
Attributes
| Name | Type | Description |
|---|---|---|
content |
str |
|
parsed_ast |
Any |
|
links |
list[str] |
|
_ast_cache |
list[dict[str, Any]] | None |
|
_html_cache |
str | None |
|
_plain_text_cache |
str | None |
Methods 3
ast
property
True AST - list of tokens from markdown parser.
Returns the structural represe…
ast
property def ast(self) -> list[dict[str, Any]] | None
True AST - list of tokens from markdown parser.
Returns the structural representation of content as parsed by the markdown engine. This enables efficient multi-output generation:
- HTML rendering
- Plain text extraction
- TOC generation
- Link extraction
Returns
List of AST tokens if available, None if parser doesn't support AST.list[dict[str, Any]] | None
—
html
property
HTML content rendered from AST or legacy parser.
This is the preferred way to …
html
property def html(self) -> str
HTML content rendered from AST or legacy parser.
This is the preferred way to access rendered HTML content.
Use this instead of the deprecatedparsed_astfield.
Returns
Rendered HTML stringstr
—
plain_text
property
Plain text extracted from content (for search/LLM).
Strips HTML tags f…
plain_text
property def plain_text(self) -> str
Plain text extracted from content (for search/LLM).
Strips HTML tags from rendered content to get clean text.
Uses the rendered HTML (which includes directive output) for accuracy.
Returns
Plain text content with HTML tags removedstr
—
Internal Methods 4
_render_ast_to_html
Render AST tokens to HTML.
Internal method used when true AST is available (Ph…
_render_ast_to_html
def _render_ast_to_html(self) -> str
Render AST tokens to HTML.
Internal method used when true AST is available (Phase 3).
Returns
Rendered HTML stringstr
—
_extract_text_from_ast
Extract plain text from AST tokens.
Walks the AST tree and extracts all text c…
_extract_text_from_ast
def _extract_text_from_ast(self) -> str
Extract plain text from AST tokens.
Walks the AST tree and extracts all text content, ignoring structural elements like code blocks.
Returns
Plain text stringstr
—
_extract_links_from_ast
Extract links from AST tokens.
Walks the AST tree and extracts all link URLs (…
_extract_links_from_ast
def _extract_links_from_ast(self) -> list[str]
Extract links from AST tokens.
Walks the AST tree and extracts all link URLs (Phase 3).
Handles Mistune 3.x AST format where URLs are inattrs.url.
Returns
List of link URLslist[str]
—
_strip_html_to_text
Strip HTML tags from content to get plain text.
Fallback method when AST is no…
_strip_html_to_text
def _strip_html_to_text(self, html: str) -> str
Strip HTML tags from content to get plain text.
Fallback method when AST is not available.
Parameters 1
html |
str |
HTML content |
Returns
Plain text with HTML tags removedstr
—