Classes
Lexer
12
▼
State-machine lexer with O(n) guaranteed performance.
Uses a window-based approach for block s…
Lexer
12
▼
State-machine lexer with O(n) guaranteed performance.
Uses a window-based approach for block scanning:
1. Scan to end of line (find window)
2. Classify the line (pure logic, no position changes)
3. Commit position (always advances)
This eliminates rewinds and guarantees forward progress.
Usage:
>>> lexer = Lexer("# Hello
World") >>> for token in lexer.tokenize(): ... print(token) Token(ATX_HEADING, '# Hello', 1:1) Token(BLANK_LINE, '', 2:1) Token(PARAGRAPH_LINE, 'World', 3:1) Token(EOF, '', 3:6)
Thread Safety:
Lexer instances are single-use. Create one per source string.
All state is instance-local; no shared mutable state.
Methods
tokenize
0
Iterator[Token]
▼
Tokenize source into token stream.
tokenize
0
Iterator[Token]
▼
def tokenize(self) -> Iterator[Token]
Returns
Iterator[Token]
Internal Methods 11 ▼
__init__
3
▼
Initialize lexer with source text.
__init__
3
▼
def __init__(self, source: str, source_file: str | None = None, text_transformer: Callable[[str], str] | None = None) -> None
Parameters
| Name | Type | Description |
|---|---|---|
source |
— |
Markdown source text |
source_file |
— |
Optional source file path for error messages Default:None
|
text_transformer |
— |
Optional callback to transform plain text lines Default:None
|
_dispatch_mode
0
Iterator[Token]
▼
Dispatch to appropriate scanner based on current mode.
_dispatch_mode
0
Iterator[Token]
▼
def _dispatch_mode(self) -> Iterator[Token]
Returns
Iterator[Token]
_find_line_end
0
int
▼
Find the end of the current line (position of \n or EOF).
Uses str.find for O(…
_find_line_end
0
int
▼
def _find_line_end(self) -> int
Find the end of the current line (position of \n or EOF).
Uses str.find for O(n) with low constant factor (C implementation).
Returns
int
Position of newline or end of source.
_calc_indent
1
tuple[int, int]
▼
Calculate indent level and content start position.
Spaces count as 1, tabs exp…
_calc_indent
1
tuple[int, int]
▼
def _calc_indent(self, line: str) -> tuple[int, int]
Calculate indent level and content start position.
Spaces count as 1, tabs expand to next multiple of 4.
Parameters
| Name | Type | Description |
|---|---|---|
line |
— |
Line content |
Returns
tuple[int, int]
(indent_spaces, content_start_index)
_expand_tabs
2
str
▼
Expand tabs in text to spaces based on start_col (1-indexed).
_expand_tabs
2
str
▼
def _expand_tabs(self, text: str, start_col: int = 1) -> str
Parameters
| Name | Type | Description |
|---|---|---|
text |
— |
|
start_col |
— |
Default:1
|
Returns
str
_commit_to
1
▼
Commit position to line_end, consuming newline if present.
Sets self._consumed…
_commit_to
1
▼
def _commit_to(self, line_end: int) -> None
Commit position to line_end, consuming newline if present.
Sets self._consumed_newline to indicate if a newline was consumed. Uses optimized string operations instead of character-by-character loop.
Parameters
| Name | Type | Description |
|---|---|---|
line_end |
— |
Position to commit to. |
_peek
0
str
▼
Peek at current character without advancing.
_peek
0
str
▼
def _peek(self) -> str
Returns
str
Current character or empty string at end of input.
_advance
0
str
▼
Advance position by one character.
Updates line/column tracking.
_advance
0
str
▼
def _advance(self) -> str
Returns
str
The consumed character.
_save_location
0
▼
Save current location for O(1) token location creation.
Call this at the START…
_save_location
0
▼
def _save_location(self) -> None
Save current location for O(1) token location creation.
Call this at the START of scanning a line, before any position changes.
_location
0
SourceLocation
▼
Get current source location.
_location
0
SourceLocation
▼
def _location(self) -> SourceLocation
Returns
SourceLocation
SourceLocation at current position.
_location_from
3
SourceLocation
▼
Get source location from saved position.
O(1) - uses pre-saved location from _…
_location_from
3
SourceLocation
▼
def _location_from(self, start_pos: int, start_col: int | None = None, end_pos: int | None = None) -> SourceLocation
Get source location from saved position.
O(1) - uses pre-saved location from _save_location() call.
Parameters
| Name | Type | Description |
|---|---|---|
start_pos |
— |
Start position in source. |
start_col |
— |
Optional column override (1-indexed). Default:None
|
end_pos |
— |
Optional end position override. Default:None
|
Returns
SourceLocation
SourceLocation spanning from start_pos to current or end_pos.