Classes
LexerMode
0
▼
Lexer operating mode.
LexerMode
0
▼
Lexer operating mode.
LexerConfig
10
▼
Lexer configuration for delimiter customization and whitespace control.
Allows customizing templat…
LexerConfig
10
▼
Lexer configuration for delimiter customization and whitespace control.
Allows customizing template delimiters and enabling automatic whitespace trimming. Frozen for thread-safety (immutable after creation).
Attributes
| Name | Type | Description |
|---|---|---|
block_start |
str
|
Block tag opening delimiter (default: '{%') |
block_end |
str
|
Block tag closing delimiter (default: '%}') |
variable_start |
str
|
Variable tag opening delimiter (default: '{{') |
variable_end |
str
|
Variable tag closing delimiter (default: '}}') |
comment_start |
str
|
Comment opening delimiter (default: '{#') |
comment_end |
str
|
Comment closing delimiter (default: '#}') |
line_statement_prefix |
str | None
|
Line statement prefix, e.g., '#' (default: None) |
line_comment_prefix |
str | None
|
Line comment prefix, e.g., '##' (default: None) |
trim_blocks |
bool
|
Remove first newline after block tags (default: False) |
lstrip_blocks |
bool
|
Strip leading whitespace before block tags (default: False) |
LexerError
1
▼
Lexer error with source location.
LexerError
1
▼
Lexer error with source location.
Methods
Internal Methods 1 ▼
__init__
5
▼
__init__
5
▼
def __init__(self, message: str, source: str, lineno: int, col_offset: int, suggestion: str | None = None)
Parameters
| Name | Type | Description |
|---|---|---|
message |
— |
|
source |
— |
|
lineno |
— |
|
col_offset |
— |
|
suggestion |
— |
Default:None
|
Lexer
17
▼
Template lexer that transforms source into a token stream.
The Lexer is the first stage of templat…
Lexer
17
▼
Template lexer that transforms source into a token stream.
The Lexer is the first stage of template compilation. It scans source text and yields Token objects representing literals, operators, identifiers, and template delimiters.
Thread-Safety: Instance state is mutable during tokenization (position tracking). Create one Lexer per source string; do not reuse across threads.
Operator Lookup:
Uses O(1) dict lookup instead of O(k) list iteration:
python _OPERATORS_2CHAR = {"**": TokenType.POW, "//": TokenType.FLOORDIV, ...} _OPERATORS_1CHAR = {"+": TokenType.ADD, "-": TokenType.SUB, ...}
Whitespace Control:
Handles{{-,-}},{%-,-%}modifiers:
- Left modifier (
{{-,{%-): Strips trailing whitespace from preceding DATA - Right modifier (
-}},-%}): Strips leading whitespace from following DATA
Error Handling:
LexerErrorincludes source snippet with caret and suggestions:
Lexer Error: Unterminated string literal --> line 3:15 | 3 | {% set x = "hello %} | ^ Suggestion: Add closing " to end the string
Attributes
| Name | Type | Description |
|---|---|---|
_OPERATORS_3CHAR |
dict[str, TokenType]
|
— |
_OPERATORS_2CHAR |
dict[str, TokenType]
|
— |
_OPERATORS_1CHAR |
dict[str, TokenType]
|
— |
Methods
tokenize
0
Iterator[Token]
▼
Tokenize the source and yield tokens.
tokenize
0
Iterator[Token]
▼
def tokenize(self) -> Iterator[Token]
Returns
Iterator[Token]
Internal Methods 13 ▼
_get_delimiter_pattern
1
re.Pattern[str]
▼
Get compiled delimiter pattern for config (cached).
Compiles a single regex th…
staticmethod
_get_delimiter_pattern
1
re.Pattern[str]
▼
def _get_delimiter_pattern(config: LexerConfig) -> re.Pattern[str]
Get compiled delimiter pattern for config (cached).
Compiles a single regex that matches any of the three delimiter types. Result is cached per unique config for O(1) subsequent lookups.
Performance: Single regex search is 5-24x faster than 3x str.find() (validated in benchmarks/test_benchmark_lexer.py).
Parameters
| Name | Type | Description |
|---|---|---|
config |
— |
Returns
re.Pattern[str]
__init__
2
▼
Initialize lexer with source code.
__init__
2
▼
def __init__(self, source: str, config: LexerConfig | None = None)
Parameters
| Name | Type | Description |
|---|---|---|
source |
— |
Template source code |
config |
— |
Lexer configuration (uses defaults if None) Default:None
|
_tokenize_data
0
Iterator[Token]
▼
Tokenize raw data outside template constructs.
_tokenize_data
0
Iterator[Token]
▼
def _tokenize_data(self) -> Iterator[Token]
Returns
Iterator[Token]
_tokenize_code
2
Iterator[Token]
▼
Tokenize code inside {{ }} or {% %}.
_tokenize_code
2
Iterator[Token]
▼
def _tokenize_code(self, end_delimiter: str, end_token_type: TokenType) -> Iterator[Token]
Parameters
| Name | Type | Description |
|---|---|---|
end_delimiter |
— |
|
end_token_type |
— |
Returns
Iterator[Token]
_tokenize_comment
0
Iterator[Token]
▼
Skip comment content until closing delimiter.
_tokenize_comment
0
Iterator[Token]
▼
def _tokenize_comment(self) -> Iterator[Token]
Returns
Iterator[Token]
_next_code_token
0
Token
▼
Get the next token from code content.
Complexity: O(1) for operator lookup (di…
_next_code_token
0
Token
▼
def _next_code_token(self) -> Token
Get the next token from code content.
Complexity: O(1) for operator lookup (dict-based).
Returns
Token
_scan_string
0
Token
▼
Scan a string literal.
_scan_string
0
Token
▼
def _scan_string(self) -> Token
Returns
Token
_scan_number
0
Token
▼
Scan a number literal (integer or float).
Note: Special handling for range lit…
_scan_number
0
Token
▼
def _scan_number(self) -> Token
Scan a number literal (integer or float).
Note: Special handling for range literals (1..10, 1...11). If we see digits followed by '..' or '...', treat the digits as an integer, not a float, so the range operator can be parsed.
Returns
Token
_scan_name
0
Token
▼
Scan a name or keyword.
_scan_name
0
Token
▼
def _scan_name(self) -> Token
Returns
Token
_find_next_construct
0
tuple[str, int] | None
▼
Find the next template construct ({{ }}, {% %}, or {# #}).
Uses a single compi…
_find_next_construct
0
tuple[str, int] | None
▼
def _find_next_construct(self) -> tuple[str, int] | None
Find the next template construct ({{ }}, {% %}, or {# #}).
Uses a single compiled regex search instead of 3x str.find() calls. The regex is cached per LexerConfig for O(1) subsequent lookups.
Performance: 5-24x faster than the previous str.find() approach (validated in benchmarks/test_benchmark_lexer.py).
Returns
tuple[str, int] | None
_emit_delimiter
2
Token
▼
Emit a delimiter token and advance position.
_emit_delimiter
2
Token
▼
def _emit_delimiter(self, delimiter: str, token_type: TokenType) -> Token
Parameters
| Name | Type | Description |
|---|---|---|
delimiter |
— |
|
token_type |
— |
Returns
Token
_skip_whitespace
0
▼
Skip whitespace characters.
_skip_whitespace
0
▼
def _skip_whitespace(self) -> None
_advance
1
▼
Advance position by count characters, tracking line/column.
Optimized to use b…
_advance
1
▼
def _advance(self, count: int) -> None
Advance position by count characters, tracking line/column.
Optimized to use batch processing with count() for newline detection instead of character-by-character iteration. Provides ~15-20% speedup for templates with long DATA nodes.
Parameters
| Name | Type | Description |
|---|---|---|
count |
— |
Functions
tokenize
2
list[Token]
▼
Convenience function to tokenize source into a list.
tokenize
2
list[Token]
▼
def tokenize(source: str, config: LexerConfig | None = None) -> list[Token]
Parameters
| Name | Type | Description |
|---|---|---|
source |
str |
Template source code |
config |
LexerConfig | None |
Optional lexer configuration Default:None
|
Returns
list[Token]