Module

lexer.core

State-machine lexer with O(n) guaranteed performance.

Implements a window-based approach: scan entire lines, classify, then commit. This eliminates position rewinds and guarantees forward progress.

No regex in the hot path. Zero ReDoS vulnerability by construction.

Thread Safety:

Lexer instances are single-use. Create one per source string. All state is instance-local; no shared mutable state.

Classes

Lexer 12
State-machine lexer with O(n) guaranteed performance. Uses a window-based approach for block s…

State-machine lexer with O(n) guaranteed performance.

Uses a window-based approach for block scanning:
1. Scan to end of line (find window)
2. Classify the line (pure logic, no position changes)
3. Commit position (always advances)

This eliminates rewinds and guarantees forward progress.

Usage:
        >>> lexer = Lexer("# Hello

World") >>> for token in lexer.tokenize(): ... print(token) Token(ATX_HEADING, '# Hello', 1:1) Token(BLANK_LINE, '', 2:1) Token(PARAGRAPH_LINE, 'World', 3:1) Token(EOF, '', 3:6)

Thread Safety:
    Lexer instances are single-use. Create one per source string.
    All state is instance-local; no shared mutable state.

Methods

tokenize 0 Iterator[Token]
Tokenize source into token stream.
def tokenize(self) -> Iterator[Token]
Returns
Iterator[Token]
Internal Methods 11
__init__ 3
Initialize lexer with source text.
def __init__(self, source: str, source_file: str | None = None, text_transformer: Callable[[str], str] | None = None) -> None
Parameters
Name Type Description
source

Markdown source text

source_file

Optional source file path for error messages

Default:None
text_transformer

Optional callback to transform plain text lines

Default:None
_dispatch_mode 0 Iterator[Token]
Dispatch to appropriate scanner based on current mode.
def _dispatch_mode(self) -> Iterator[Token]
Returns
Iterator[Token]
_find_line_end 0 int
Find the end of the current line (position of \n or EOF). Uses str.find for O(…
def _find_line_end(self) -> int

Find the end of the current line (position of \n or EOF).

Uses str.find for O(n) with low constant factor (C implementation).

Returns
int Position of newline or end of source.
_calc_indent 1 tuple[int, int]
Calculate indent level and content start position. Spaces count as 1, tabs exp…
def _calc_indent(self, line: str) -> tuple[int, int]

Calculate indent level and content start position.

Spaces count as 1, tabs expand to next multiple of 4.

Parameters
Name Type Description
line

Line content

Returns
tuple[int, int] (indent_spaces, content_start_index)
_expand_tabs 2 str
Expand tabs in text to spaces based on start_col (1-indexed).
def _expand_tabs(self, text: str, start_col: int = 1) -> str
Parameters
Name Type Description
text
start_col Default:1
Returns
str
_commit_to 1
Commit position to line_end, consuming newline if present. Sets self._consumed…
def _commit_to(self, line_end: int) -> None

Commit position to line_end, consuming newline if present.

Sets self._consumed_newline to indicate if a newline was consumed. Uses optimized string operations instead of character-by-character loop.

Parameters
Name Type Description
line_end

Position to commit to.

_peek 0 str
Peek at current character without advancing.
def _peek(self) -> str
Returns
str Current character or empty string at end of input.
_advance 0 str
Advance position by one character. Updates line/column tracking.
def _advance(self) -> str
Returns
str The consumed character.
_save_location 0
Save current location for O(1) token location creation. Call this at the START…
def _save_location(self) -> None

Save current location for O(1) token location creation.

Call this at the START of scanning a line, before any position changes.

_location 0 SourceLocation
Get current source location.
def _location(self) -> SourceLocation
Returns
SourceLocation SourceLocation at current position.
_location_from 3 SourceLocation
Get source location from saved position. O(1) - uses pre-saved location from _…
def _location_from(self, start_pos: int, start_col: int | None = None, end_pos: int | None = None) -> SourceLocation

Get source location from saved position.

O(1) - uses pre-saved location from _save_location() call.

Parameters
Name Type Description
start_pos

Start position in source.

start_col

Optional column override (1-indexed).

Default:None
end_pos

Optional end position override.

Default:None
Returns
SourceLocation SourceLocation spanning from start_pos to current or end_pos.