Module

`lexers.html_sm`

Hand-written HTML lexer using state machine approach.

O(n) guaranteed, zero regex, thread-safe.

Language Support:

Token Classification:

Performance:

~45µs per 100-line file.

Thread-Safety:

Uses only local variables intokenize().

See Also:

Classes

HtmlStateMachineLexer 1 ▼

HTML lexer with tag, attribute, and comment parsing. Handles HTML5 syntax including comments, doct…

HTML lexer with tag, attribute, and comment parsing.

Handles HTML5 syntax including comments, doctype, and tag attributes.

tokenize 4 Iterator[Token] ▼

def tokenize(self, code: str, config: LexerConfig | None = None, *, start: int = 0, end: int | None = None) -> Iterator[Token]

Iterator[Token]