core - Patitas

Classes

Lexer 10 ▼

State-machine lexer with O(n) guaranteed performance. Uses a window-based approach for block s…

State-machine lexer with O(n) guaranteed performance.

Uses a window-based approach for block scanning:
1. Scan to end of line (find window)
2. Classify the line (pure logic, no position changes)
3. Commit position (always advances)

This eliminates rewinds and guarantees forward progress.

Usage:
        >>> lexer = Lexer("# Hello

World") >>> for token in lexer.tokenize(): ... print(token) Token(ATX_HEADING, '# Hello', 1:1) Token(BLANK_LINE, '', 2:1) Token(PARAGRAPH_LINE, 'World', 3:1) Token(EOF, '', 3:6)

Thread Safety:
    Lexer instances are single-use. Create one per source string.
    All state is instance-local; no shared mutable state.

Methods

tokenize 0 Iterator[Token] ▼

Tokenize source into token stream.

def tokenize(self) -> Iterator[Token]

Returns

Iterator[Token]

Internal Methods 9 ▼

__init__ 3 ▼

Initialize lexer with source text.

def __init__(self, source: str, source_file: str | None = None, text_transformer: Callable[[str], str] | None = None) -> None

Parameters

Name	Type	Description
`source`	`—`	Markdown source text
`source_file`	`—`	Optional source file path for error messages Default:`None`
`text_transformer`	`—`	Optional callback to transform plain text lines Default:`None`

_dispatch_mode 0 Iterator[Token] ▼

Dispatch to appropriate scanner based on current mode.

def _dispatch_mode(self) -> Iterator[Token]

Returns

Iterator[Token]

_find_line_end 0 int ▼

Find the end of the current line (position of \n or EOF). Uses str.find for O(…

def _find_line_end(self) -> int

Find the end of the current line (position of \n or EOF).

Uses str.find for O(n) with low constant factor (C implementation).

Returns

int Position of newline or end of source.

_calc_indent 1 tuple[int, int] ▼

Calculate indent level and content start position. Spaces count as 1, tabs exp…

def _calc_indent(self, line: str) -> tuple[int, int]

Calculate indent level and content start position.

Spaces count as 1, tabs expand to next multiple of 4.

Parameters

Name	Type	Description
`line`	`—`	Line content

Returns

tuple[int, int] (indent_spaces, content_start_index)

_expand_tabs 2 str ▼

Expand tabs in text to spaces based on start_col (1-indexed).

def _expand_tabs(self, text: str, start_col: int = 1) -> str

Parameters

Name	Type	Description
`text`	`—`
`start_col`	`—`	Default:`1`

Returns

str

_commit_to 1 ▼

Commit position to line_end, consuming newline if present. Sets self._consumed…

def _commit_to(self, line_end: int) -> None

Commit position to line_end, consuming newline if present.

Sets self._consumed_newline to indicate if a newline was consumed. Uses optimized string operations instead of character-by-character loop.

Parameters

Name	Type	Description
`line_end`	`—`	Position to commit to.

_save_location 0 ▼

Save current location for O(1) token location creation. Call this at the START…

def _save_location(self) -> None

Save current location for O(1) token location creation.

Call this at the START of scanning a line, before any position changes.

_make_token 6 Token ▼

Create a Token with raw coordinates (lazy SourceLocation). O(1) - uses pre-sav…

def _make_token(self, token_type: TokenType, value: str, start_pos: int, *, start_col: int | None = None, end_pos: int | None = None, line_indent: int = -1) -> Token

Create a Token with raw coordinates (lazy SourceLocation).

O(1) - uses pre-saved location from _save_location() call. Avoids SourceLocation allocation until token.location is accessed.

Parameters

Name	Type	Description
`token_type`	`—`	The token type.
`value`	`—`	The raw string value.
`start_pos`	`—`	Start position in source.
`start_col`	`—`	Optional column override (1-indexed). Default:`None`
`end_pos`	`—`	Optional end position override. Default:`None`
`line_indent`	`—`	Pre-computed indent level. Default:`-1`

Returns

Token Token with pre-created SourceLocation (avoids property overhead).

_make_token_at_current 3 Token ▼

Create a Token at current position (for EOF and similar).

def _make_token_at_current(self, token_type: TokenType, value: str, *, line_indent: int = 0) -> Token

Parameters

Name	Type	Description
`token_type`	`—`	The token type.
`value`	`—`	The raw string value.
`line_indent`	`—`	Pre-computed indent level. Default:`0`

Returns

Token Token at current position with pre-created SourceLocation.

lexer.core

Classes

Methods

Returns

Parameters

Returns

Returns

Parameters

Returns

Parameters

Returns

Parameters

Parameters

Returns

Parameters

Returns

`lexer.core`