Module

parsing.dispatch

Parser dispatch based on token pattern analysis.

Pre-classifies documents by token pattern to select optimal parsing strategy. CommonMark defines only 57 unique token patterns - we can optimize for each!

Complexity Levels:

  • ULTRA_SIMPLE (47.5%): Pure inline (PARAGRAPH_LINE, BLANK_LINE only)
  • SIMPLE (26.2%): No containers (headings, code blocks, HTML)
  • MODERATE (10.0%): Single container type, shallow nesting
  • COMPLEX (16.3%): Multiple containers, deep nesting

This pre-classification enables 3-10x speedups for simple documents.

Classes

ComplexityLevel 0
Document complexity classification.

Document complexity classification.

Functions

classify_complexity 1 ComplexityLevel
Classify document complexity based on token pattern. O(n) scan of tokens to de…
def classify_complexity(tokens: list[Token]) -> ComplexityLevel

Classify document complexity based on token pattern.

O(n) scan of tokens to determine optimal parsing strategy. This classification cost is amortized by faster parsing.

Parameters
Name Type Description
tokens list[Token]

List of tokens from lexer

Returns
ComplexityLevel
get_token_pattern 1 tuple[TokenType, ...]
Get unique token type pattern for pattern matching. Returns a sorted tuple of …
def get_token_pattern(tokens: list[Token]) -> tuple[TokenType, ...]

Get unique token type pattern for pattern matching.

Returns a sorted tuple of unique token types, which can be used as a key for pattern-specific optimizations.

Parameters
Name Type Description
tokens list[Token]

List of tokens from lexer

Returns
tuple[TokenType, ...]