lexer - Kida

Classes

LexerMode 0 ▼

Lexer operating mode.

LexerConfig 10 ▼

Lexer configuration for delimiter customization and whitespace control. Allows customizing templat…

Lexer configuration for delimiter customization and whitespace control.

Allows customizing template delimiters and enabling automatic whitespace trimming. Frozen for thread-safety (immutable after creation).

Attributes

Name	Type	Description
`block_start`	`str`	Block tag opening delimiter (default: '{%')
`block_end`	`str`	Block tag closing delimiter (default: '%}')
`variable_start`	`str`	Variable tag opening delimiter (default: '{{')
`variable_end`	`str`	Variable tag closing delimiter (default: '}}')
`comment_start`	`str`	Comment opening delimiter (default: '{#')
`comment_end`	`str`	Comment closing delimiter (default: '#}')
`line_statement_prefix`	`str \| None`	Line statement prefix, e.g., '#' (default: None)
`line_comment_prefix`	`str \| None`	Line comment prefix, e.g., '##' (default: None)
`trim_blocks`	`bool`	Remove first newline after block tags (default: False)
`lstrip_blocks`	`bool`	Strip leading whitespace before block tags (default: False)

LexerError 1 ▼

Lexer error with source location.

Methods

Internal Methods 1 ▼

__init__ 5 ▼

def __init__(self, message: str, source: str, lineno: int, col_offset: int, suggestion: str | None = None)

Parameters

Name	Type	Description
`message`	`—`
`source`	`—`
`lineno`	`—`
`col_offset`	`—`
`suggestion`	`—`	Default:`None`

Lexer 17 ▼

Template lexer that transforms source into a token stream. The Lexer is the first stage of templat…

Template lexer that transforms source into a token stream.

The Lexer is the first stage of template compilation. It scans source text and yields Token objects representing literals, operators, identifiers, and template delimiters.

Thread-Safety: Instance state is mutable during tokenization (position tracking). Create one Lexer per source string; do not reuse across threads.

Operator Lookup: Uses O(1) dict lookup instead of O(k) list iteration: python _OPERATORS_2CHAR = {"**": TokenType.POW, "//": TokenType.FLOORDIV, ...} _OPERATORS_1CHAR = {"+": TokenType.ADD, "-": TokenType.SUB, ...}

Whitespace Control: Handles{{-, -}}, {%-, -%}modifiers:

Left modifier ({{-, {%-): Strips trailing whitespace from preceding DATA
Right modifier (-}}, -%}): Strips leading whitespace from following DATA

Error Handling: LexerErrorincludes source snippet with caret and suggestions: Lexer Error: Unterminated string literal --> line 3:15 | 3 | {% set x = "hello %} | ^ Suggestion: Add closing " to end the string

Attributes

Name	Type	Description
`_OPERATORS_3CHAR`	`ClassVar[dict[str, TokenType]]`	—
`_OPERATORS_2CHAR`	`ClassVar[dict[str, TokenType]]`	—
`_OPERATORS_1CHAR`	`ClassVar[dict[str, TokenType]]`	—

Methods

tokenize 0 Iterator[Token] ▼

Tokenize the source and yield tokens.

def tokenize(self) -> Iterator[Token]

Returns

Iterator[Token]

Internal Methods 13 ▼

_get_delimiter_pattern 1 re.Pattern[str] ▼

Get compiled delimiter pattern for config (cached). Compiles a single regex th…

staticmethod

def _get_delimiter_pattern(config: LexerConfig) -> re.Pattern[str]

Get compiled delimiter pattern for config (cached).

Compiles a single regex that matches any of the three delimiter types. Result is cached per unique config for O(1) subsequent lookups.

Performance: Single regex search is 5-24x faster than 3x str.find() (validated in benchmarks/test_benchmark_lexer.py).

Parameters

Name	Type	Description
`config`	`—`

Returns

re.Pattern[str]

__init__ 2 ▼

Initialize lexer with source code.

def __init__(self, source: str, config: LexerConfig | None = None)

Parameters

Name	Type	Description
`source`	`—`	Template source code
`config`	`—`	Lexer configuration (uses defaults if None) Default:`None`

_tokenize_data 0 Iterator[Token] ▼

Tokenize raw data outside template constructs.

def _tokenize_data(self) -> Iterator[Token]

Returns

Iterator[Token]

_tokenize_code 2 Iterator[Token] ▼

Tokenize code inside {{ }} or {% %}.

def _tokenize_code(self, end_delimiter: str, end_token_type: TokenType) -> Iterator[Token]

Parameters

Name	Type	Description
`end_delimiter`	`—`
`end_token_type`	`—`

Returns

Iterator[Token]

_tokenize_comment 0 Iterator[Token] ▼

Skip comment content until closing delimiter.

def _tokenize_comment(self) -> Iterator[Token]

Returns

Iterator[Token]

_next_code_token 0 Token ▼

Get the next token from code content. Complexity: O(1) for operator lookup (di…

def _next_code_token(self) -> Token

Get the next token from code content.

Complexity: O(1) for operator lookup (dict-based).

Returns

Token

_scan_string 0 Token ▼

Scan a string literal.

def _scan_string(self) -> Token

Returns

Token

_scan_number 0 Token ▼

Scan a number literal (integer or float). Note: Special handling for range lit…

def _scan_number(self) -> Token

Scan a number literal (integer or float).

Note: Special handling for range literals (1..10, 1...11). If we see digits followed by '..' or '...', treat the digits as an integer, not a float, so the range operator can be parsed.

Returns

Token

_scan_name 0 Token ▼

Scan a name or keyword.

def _scan_name(self) -> Token

Returns

Token

_find_next_construct 0 tuple[str, int] | None ▼

Find the next template construct ({{ }}, {% %}, or {# #}). Uses a single compi…

def _find_next_construct(self) -> tuple[str, int] | None

Find the next template construct ({{ }}, {% %}, or {# #}).

Uses a single compiled regex search instead of 3x str.find() calls. The regex is cached per LexerConfig for O(1) subsequent lookups.

Performance: 5-24x faster than the previous str.find() approach (validated in benchmarks/test_benchmark_lexer.py).

Returns

tuple[str, int] | None

_emit_delimiter 2 Token ▼

Emit a delimiter token and advance position.

def _emit_delimiter(self, delimiter: str, token_type: TokenType) -> Token

Parameters

Name	Type	Description
`delimiter`	`—`
`token_type`	`—`

Returns

Token

_skip_whitespace 0 ▼

Skip whitespace characters.

def _skip_whitespace(self) -> None

_advance 1 ▼

Advance position by count characters, tracking line/column. Optimized to use b…

def _advance(self, count: int) -> None

Advance position by count characters, tracking line/column.

Optimized to use batch processing with count() for newline detection instead of character-by-character iteration. Provides ~15-20% speedup for templates with long DATA nodes.

Parameters

Name	Type	Description
`count`	`—`

Functions

tokenize 2 list[Token] ▼

Convenience function to tokenize source into a list.

def tokenize(source: str, config: LexerConfig | None = None) -> list[Token]

Parameters

Name	Type	Description
`source`	`str`	Template source code
`config`	`LexerConfig \| None`	Optional lexer configuration Default:`None`

Returns

list[Token]

`lexer`

Convenience function:

Classes

Attributes

Methods

Parameters

Attributes

Methods

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Returns

Returns

Returns

Returns

Returns

Returns

Parameters

Returns

Parameters

Functions

Parameters

Returns