native_html

rendering.parsers.native_html

Native HTML parser for build-time validation and health checks.

This parser is used duringbengal buildfor:

Health check validation (detecting unrendered directives, Jinja templates)
Text extraction from rendered HTML (excluding code blocks)
Performance-optimized alternative to BeautifulSoup4

Design:

Uses Python's stdlib html.parser (fast, zero dependencies)
Tracks state for code/script/style blocks to exclude from text extraction
Optimized for build-time validation, not complex DOM manipulation

Performance:

~5-10x faster than BeautifulSoup4 for text extraction
Suitable for high-volume build-time validation

Classes

NativeHTMLParser

Fast HTML parser for build-time validation and text extraction. This parser is the production pars…

Fast HTML parser for build-time validation and text extraction.

This parser is the production parser used duringbengal buildfor health checks and validation. It's optimized for speed over features, using Python's stdlib html.parser without external dependencies.

Primary use cases:

Health check validation (unrendered directives, Jinja templates)
Text extraction for search indexing
Link validation and content analysis

Performance:

~5-10x faster than BeautifulSoup4 for text extraction
Zero external dependencies (uses stdlib only)

Example: >>> parser = NativeHTMLParser() >>> result = parser.feed("<p>Hello <code>world</code></p>") >>> result.get_text() 'Hello' # Code block excluded

Inherits from HTMLParser

Methods 6

handle_starttag

Handle opening tags.

2 None

def handle_starttag(self, tag: str, attrs: list[tuple[str, str | None]]) -> None

Handle opening tags.

Parameters 2

`tag`	`str`
`attrs`	`list[tuple[str, str \| None]]`

handle_endtag

Handle closing tags.

1 None

def handle_endtag(self, tag: str) -> None

Handle closing tags.

Parameters 1

tag str

handle_data

Handle text data.

1 None

def handle_data(self, data: str) -> None

Handle text data.

Parameters 1

data str

feed

Parse HTML content and return self for chaining.

1 NativeHTMLParser

def feed(self, data: str) -> NativeHTMLParser

Parse HTML content and return self for chaining.

Parameters 1

data str

Returns

NativeHTMLParser —

self to allow parser(html).get_text() pattern

get_text

Get extracted text content (excluding code/script/style blocks).

0 str

def get_text(self) -> str

Get extracted text content (excluding code/script/style blocks).

Returns

str —

Text content with whitespace normalized

reset

Reset parser state for reuse.

0 None

def reset(self) -> None

Reset parser state for reuse.

Internal Methods 1

__init__

0 None

def __init__(self) -> None