Module

lexer.classifiers.html

HTML block classifier mixin.

Classes

HtmlClassifierMixin 14
Mixin providing HTML block classification. Implements CommonMark 4.6 HTML block types 1-7.

Mixin providing HTML block classification.

Implements CommonMark 4.6 HTML block types 1-7.

Attributes

Name Type Description
_mode LexerMode
_html_block_type int
_html_block_content list[str]
_html_block_start int
_html_block_indent int
_pos int
_source_len int
_consumed_newline bool

Methods

Internal Methods 6
_location_from 3 SourceLocation
Get source location from saved position. Implemented by Lexer.
def _location_from(self, start_pos: int, start_col: int | None = None, end_pos: int | None = None) -> SourceLocation
Parameters
Name Type Description
start_pos
start_col Default:None
end_pos Default:None
Returns
SourceLocation
_try_classify_html_block_start 4 Iterator[Token] | None
Try to classify content as HTML block start. CommonMark 4.6 defines 7 types of…
def _try_classify_html_block_start(self, content: str, line_start: int, full_line: str, indent: int = 0) -> Iterator[Token] | None

Try to classify content as HTML block start.

CommonMark 4.6 defines 7 types of HTML blocks.

Parameters
Name Type Description
content

Line content with leading whitespace stripped

line_start

Position in source where line starts

full_line

The full line including leading whitespace

indent

Number of leading spaces (for line_indent)

Default:0
Returns
Iterator[Token] | None Iterator yielding HTML_BLOCK token, or None if not HTML block.
_extract_html_tag_name 1 str | None
Extract tag name from HTML opening or closing tag.
def _extract_html_tag_name(self, content: str) -> str | None
Parameters
Name Type Description
content

Line content starting with <

Returns
str | None Tag name if found, None otherwise.
_is_complete_html_tag 1 bool
Check if content is a complete single HTML open/close tag. Type 7 HTML blocks …
def _is_complete_html_tag(self, content: str) -> bool

Check if content is a complete single HTML open/close tag.

Type 7 HTML blocks require a SINGLE complete tag that's the only content on line. This means: , , , or - NOT content.

The tag name must also NOT be one of the type 6 block-level tags. Must not match autolinks like http://... or email@domain.

CommonMark strict attribute validation:

  • Attribute name: [a-zA-Z_:][a-zA-Z0-9_.:-]*
  • Attribute value: unquoted (no special chars), 'single', or "double" quoted
  • Space required between attributes (but not after final attribute before > or />)
Parameters
Name Type Description
content

Line content

Returns
bool True if this is a complete HTML tag.
_validate_html_attributes 1 bool
Validate HTML attribute string per CommonMark spec.
def _validate_html_attributes(self, attrs_str: str) -> bool
Parameters
Name Type Description
attrs_str

The portion after tag name and before > (without leading )

Returns
bool True if attributes are valid per CommonMark 6.8.
_emit_html_block 0 Iterator[Token]
Emit accumulated HTML block as a single token.
def _emit_html_block(self) -> Iterator[Token]
Returns
Iterator[Token]