Classes
Lexer
10
▼
State-machine lexer with O(n) guaranteed performance.
Uses a window-based approach for block s…
Lexer
10
▼
State-machine lexer with O(n) guaranteed performance.
Uses a window-based approach for block scanning:
1. Scan to end of line (find window)
2. Classify the line (pure logic, no position changes)
3. Commit position (always advances)
This eliminates rewinds and guarantees forward progress.
Usage:
>>> lexer = Lexer("# Hello
World") >>> for token in lexer.tokenize(): ... print(token) Token(ATX_HEADING, '# Hello', 1:1) Token(BLANK_LINE, '', 2:1) Token(PARAGRAPH_LINE, 'World', 3:1) Token(EOF, '', 3:6)
Thread Safety:
Lexer instances are single-use. Create one per source string.
All state is instance-local; no shared mutable state.
Methods
tokenize
0
Iterator[Token]
▼
Tokenize source into token stream.
tokenize
0
Iterator[Token]
▼
def tokenize(self) -> Iterator[Token]
Returns
Iterator[Token]
Internal Methods 9 ▼
__init__
3
▼
Initialize lexer with source text.
__init__
3
▼
def __init__(self, source: str, source_file: str | None = None, text_transformer: Callable[[str], str] | None = None) -> None
Parameters
| Name | Type | Description |
|---|---|---|
source |
— |
Markdown source text |
source_file |
— |
Optional source file path for error messages Default:None
|
text_transformer |
— |
Optional callback to transform plain text lines Default:None
|
_dispatch_mode
0
Iterator[Token]
▼
Dispatch to appropriate scanner based on current mode.
_dispatch_mode
0
Iterator[Token]
▼
def _dispatch_mode(self) -> Iterator[Token]
Returns
Iterator[Token]
_find_line_end
0
int
▼
Find the end of the current line (position of \n or EOF).
Uses str.find for O(…
_find_line_end
0
int
▼
def _find_line_end(self) -> int
Find the end of the current line (position of \n or EOF).
Uses str.find for O(n) with low constant factor (C implementation).
Returns
int
Position of newline or end of source.
_calc_indent
1
tuple[int, int]
▼
Calculate indent level and content start position.
Spaces count as 1, tabs exp…
_calc_indent
1
tuple[int, int]
▼
def _calc_indent(self, line: str) -> tuple[int, int]
Calculate indent level and content start position.
Spaces count as 1, tabs expand to next multiple of 4.
Parameters
| Name | Type | Description |
|---|---|---|
line |
— |
Line content |
Returns
tuple[int, int]
(indent_spaces, content_start_index)
_expand_tabs
2
str
▼
Expand tabs in text to spaces based on start_col (1-indexed).
_expand_tabs
2
str
▼
def _expand_tabs(self, text: str, start_col: int = 1) -> str
Parameters
| Name | Type | Description |
|---|---|---|
text |
— |
|
start_col |
— |
Default:1
|
Returns
str
_commit_to
1
▼
Commit position to line_end, consuming newline if present.
Sets self._consumed…
_commit_to
1
▼
def _commit_to(self, line_end: int) -> None
Commit position to line_end, consuming newline if present.
Sets self._consumed_newline to indicate if a newline was consumed. Uses optimized string operations instead of character-by-character loop.
Parameters
| Name | Type | Description |
|---|---|---|
line_end |
— |
Position to commit to. |
_save_location
0
▼
Save current location for O(1) token location creation.
Call this at the START…
_save_location
0
▼
def _save_location(self) -> None
Save current location for O(1) token location creation.
Call this at the START of scanning a line, before any position changes.
_make_token
6
Token
▼
Create a Token with raw coordinates (lazy SourceLocation).
O(1) - uses pre-sav…
_make_token
6
Token
▼
def _make_token(self, token_type: TokenType, value: str, start_pos: int, *, start_col: int | None = None, end_pos: int | None = None, line_indent: int = -1) -> Token
Create a Token with raw coordinates (lazy SourceLocation).
O(1) - uses pre-saved location from _save_location() call. Avoids SourceLocation allocation until token.location is accessed.
Parameters
| Name | Type | Description |
|---|---|---|
token_type |
— |
The token type. |
value |
— |
The raw string value. |
start_pos |
— |
Start position in source. |
start_col |
— |
Optional column override (1-indexed). Default:None
|
end_pos |
— |
Optional end position override. Default:None
|
line_indent |
— |
Pre-computed indent level. Default:-1
|
Returns
Token
Token with pre-created SourceLocation (avoids property overhead).
_make_token_at_current
3
Token
▼
Create a Token at current position (for EOF and similar).
_make_token_at_current
3
Token
▼
def _make_token_at_current(self, token_type: TokenType, value: str, *, line_indent: int = 0) -> Token
Parameters
| Name | Type | Description |
|---|---|---|
token_type |
— |
The token type. |
value |
— |
The raw string value. |
line_indent |
— |
Pre-computed indent level. Default:0
|
Returns
Token
Token at current position with pre-created SourceLocation.