# lexer

URL: /api/lexer/
Section: api

--------------------------------------------------------------------------------

lexer - Kida window.BENGAL_THEME_DEFAULTS = { appearance: 'light', palette: 'brown-bengal' }; window.Bengal = window.Bengal || {}; window.Bengal.enhanceBaseUrl = '/kida/assets/js/enhancements'; window.Bengal.watchDom = true; window.Bengal.debug = false; window.Bengal.enhanceUrls = { 'toc': '/kida/assets/js/enhancements/toc.632a9783.js', 'docs-nav': '/kida/assets/js/enhancements/docs-nav.57e4b129.js', 'tabs': '/kida/assets/js/enhancements/tabs.aac9e817.js', 'lightbox': '/kida/assets/js/enhancements/lightbox.1ca22aa1.js', 'interactive': '/kida/assets/js/enhancements/interactive.fc077855.js', 'mobile-nav': '/kida/assets/js/enhancements/mobile-nav.d991657f.js', 'action-bar': '/kida/assets/js/enhancements/action-bar.d62417f4.js', 'copy-link': '/kida/assets/js/enhancements/copy-link.7d9a5c29.js', 'data-table': '/kida/assets/js/enhancements/data-table.1f5bc1eb.js', 'lazy-loaders': '/kida/assets/js/enhancements/lazy-loaders.a5c38245.js', 'holo': '/kida/assets/js/enhancements/holo.ee13c841.js', 'link-previews': '/kida/assets/js/enhancements/link-previews.8d906535.js' }; (function () { try { var defaults = window.BENGAL_THEME_DEFAULTS || { appearance: 'system', palette: '' }; var defaultAppearance = defaults.appearance; if (defaultAppearance === 'system') { defaultAppearance = (window.matchMedia && window.matchMedia('(prefers-color-scheme: dark)').matches) ? 'dark' : 'light'; } var storedTheme = localStorage.getItem('bengal-theme'); var storedPalette = localStorage.getItem('bengal-palette'); var theme = storedTheme ? (storedTheme === 'system' ? defaultAppearance : storedTheme) : defaultAppearance; var palette = storedPalette ?? defaults.palette; document.documentElement.setAttribute('data-theme', theme); if (palette) { document.documentElement.setAttribute('data-palette', palette); } } catch (e) { document.documentElement.setAttribute('data-theme', 'light'); } })(); { "prerender": [ { "where": { "and": [ { "href_matches": "/docs/*" }, { "not": { "selector_matches": "[data-external], [target=_blank], .external" } } ] }, "eagerness": "conservative" } ], "prefetch": [ { "where": { "and": [ { "href_matches": "/*" }, { "not": { "selector_matches": "[data-external], [target=_blank], .external" } } ] }, "eagerness": "conservative" } ] } Skip to main content Magnifying Glass ESC Recent Clear Magnifying Glass No results for "" Start typing to search... ↑↓ Navigate ↵ Open ESC Close Powered by Lunr )彡 DocumentationInfoAboutArrow ClockwiseGet StartedCodeSyntaxTerminalUsageNoteTutorialsStarburstExtendingBookmarkReferenceTroubleshootingReleasesDevGitHubKida API Reference Magnifying Glass Search ⌘K Palette Appearance Chevron Down Mode Monitor System Sun Light Moon Dark Palette Snow Lynx Brown Bengal Silver Bengal Charcoal Bengal Blue Bengal List )彡 Magnifying Glass Search X Close Documentation Caret Down Info About Arrow Clockwise Get Started Code Syntax Terminal Usage Note Tutorials Starburst Extending Bookmark Reference Troubleshooting Releases Dev Caret Down GitHub Kida API Reference Palette Appearance Chevron Down Mode Monitor System Sun Light Moon Dark Palette Snow Lynx Brown Bengal Silver Bengal Charcoal Bengal Blue Bengal Kida API Reference Caret Right Analysis analyzer cache config dependencies landmarks metadata purity roles Caret Right Compiler Caret Right Statements basic control_flow functions special_blocks template_structure variables _protocols coalescing core expressions utils Caret Right Environment core exceptions filters loaders protocols registry tests Caret Right Parser Caret Right Blocks control_flow core functions special_blocks template_structure variables _protocols core errors expressions statements tokens Caret Right Utils html lru_cache workers _types bytecode_cache kida lexer nodes template tstring Kida API Reference ᗢ Caret Down Link Copy URL External Open LLM text Copy Copy LLM text Share with AI Ask Claude Ask ChatGPT Ask Gemini Ask Copilot Module lexer Kida lexer — tokenizes template source code into a token stream. The lexer scans template source and produces Token objects that the Parser consumes. It operates in four modes based on current context: Modes: DATA: Outside template constructs; collects raw text VARIABLE: Inside {{ }}; tokenizes expression BLOCK: Inside {% %}; tokenizes statement COMMENT: Inside {# #}; skips to closing delimiter Token Types: Delimiters: BLOCK_BEGIN, BLOCK_END, VARIABLE_BEGIN, VARIABLE_END Literals: STRING, INTEGER, FLOAT Identifiers: NAME (includes keywords like 'if', 'for', 'and') Operators: ADD, SUB, MUL, DIV, EQ, NE, LT, GT, etc. Punctuation: DOT, COMMA, COLON, PIPE, LPAREN, RPAREN, etc. Data: DATA (raw text between template constructs) Whitespace Control: Supports Jinja2-style whitespace trimming: {{- expr }}: Strip whitespace before {{ expr -}}: Strip whitespace after {%- stmt %} / {% stmt -%}: Same for blocks Performance: Compiled regex: Patterns are class-level, compiled once O(1) operator lookup: Dict-based, not list iteration Single-pass scanning: No backtracking Generator-based: Memory-efficient for large templates Thread-Safety: Lexer instances are single-use. Create one per tokenization. The resulting token list is immutable. Example: &gt;&gt;&gt; from kida.lexer import Lexer, tokenize &gt;&gt;&gt; lexer = Lexer(&quot;Hello, {{ name }}!&quot;) &gt;&gt;&gt; tokens = list(lexer.tokenize()) &gt;&gt;&gt; [(t.type.name, t.value) for t in tokens] [('DATA', 'Hello, '), ('VARIABLE_BEGIN', '{{'), ('NAME', 'name'), ('VARIABLE_END', '}}'), ('DATA', '!'), ('EOF', '')] Convenience function: &gt;&gt;&gt; tokens = tokenize(&quot;{{ x | upper }}&quot;) 4Classes1Function Classes LexerMode 0 ▼ Lexer operating mode. Lexer operating mode. LexerConfig 10 ▼ Lexer configuration for delimiter customization and whitespace control. Allows customizing templat… Lexer configuration for delimiter customization and whitespace control. Allows customizing template delimiters and enabling automatic whitespace trimming. Frozen for thread-safety (immutable after creation). Attributes Name Type Description block_start str Block tag opening delimiter (default: '{%') block_end str Block tag closing delimiter (default: '%}') variable_start str Variable tag opening delimiter (default: '{{') variable_end str Variable tag closing delimiter (default: '}}') comment_start str Comment opening delimiter (default: '{#') comment_end str Comment closing delimiter (default: '#}') line_statement_prefix str | None Line statement prefix, e.g., '#' (default: None) line_comment_prefix str | None Line comment prefix, e.g., '##' (default: None) trim_blocks bool Remove first newline after block tags (default: False) lstrip_blocks bool Strip leading whitespace before block tags (default: False) LexerError 1 ▼ Lexer error with source location. Lexer error with source location. Methods Internal Methods 1 ▼ __init__ 5 ▼ def __init__(self, message: str, source: str, lineno: int, col_offset: int, suggestion: str | None = None) Parameters Name Type Description message — source — lineno — col_offset — suggestion — Default: None Lexer 17 ▼ Template lexer that transforms source into a token stream. The Lexer is the first stage of templat… Template lexer that transforms source into a token stream. The Lexer is the first stage of template compilation. It scans source text and yields Token objects representing literals, operators, identifiers, and template delimiters. Thread-Safety: Instance state is mutable during tokenization (position tracking). Create one Lexer per source string; do not reuse across threads. Operator Lookup: Uses O(1) dict lookup instead of O(k) list iteration: python _OPERATORS_2CHAR = {&quot;**&quot;: TokenType.POW, &quot;//&quot;: TokenType.FLOORDIV, ...} _OPERATORS_1CHAR = {&quot;+&quot;: TokenType.ADD, &quot;-&quot;: TokenType.SUB, ...} Whitespace Control: Handles {{-, -}}, {%-, -%} modifiers: Left modifier ({{-, {%-): Strips trailing whitespace from preceding DATA Right modifier (-}}, -%}): Strips leading whitespace from following DATA Error Handling: LexerError includes source snippet with caret and suggestions: Lexer Error: Unterminated string literal --&gt; line 3:15 | 3 | {% set x = &quot;hello %} | ^ Suggestion: Add closing &quot; to end the string Attributes Name Type Description _OPERATORS_3CHAR dict[str, TokenType] — _OPERATORS_2CHAR dict[str, TokenType] — _OPERATORS_1CHAR dict[str, TokenType] — Methods tokenize 0 Iterator[Token] ▼ Tokenize the source and yield tokens. def tokenize(self) -&gt; Iterator[Token] Returns Iterator[Token] Internal Methods 13 ▼ _get_delimiter_pattern 1 re.Pattern[str] ▼ Get compiled delimiter pattern for config (cached). Compiles a single regex th… staticmethod def _get_delimiter_pattern(config: LexerConfig) -&gt; re.Pattern[str] Get compiled delimiter pattern for config (cached). Compiles a single regex that matches any of the three delimiter types. Result is cached per unique config for O(1) subsequent lookups. Performance: Single regex search is 5-24x faster than 3x str.find() (validated in benchmarks/test_benchmark_lexer.py). Parameters Name Type Description config — Returns re.Pattern[str] __init__ 2 ▼ Initialize lexer with source code. def __init__(self, source: str, config: LexerConfig | None = None) Parameters Name Type Description source — Template source code config — Lexer configuration (uses defaults if None) Default: None _tokenize_data 0 Iterator[Token] ▼ Tokenize raw data outside template constructs. def _tokenize_data(self) -&gt; Iterator[Token] Returns Iterator[Token] _tokenize_code 2 Iterator[Token] ▼ Tokenize code inside {{ }} or {% %}. def _tokenize_code(self, end_delimiter: str, end_token_type: TokenType) -&gt; Iterator[Token] Parameters Name Type Description end_delimiter — end_token_type — Returns Iterator[Token] _tokenize_comment 0 Iterator[Token] ▼ Skip comment content until closing delimiter. def _tokenize_comment(self) -&gt; Iterator[Token] Returns Iterator[Token] _next_code_token 0 Token ▼ Get the next token from code content. Complexity: O(1) for operator lookup (di… def _next_code_token(self) -&gt; Token Get the next token from code content. Complexity: O(1) for operator lookup (dict-based). Returns Token _scan_string 0 Token ▼ Scan a string literal. def _scan_string(self) -&gt; Token Returns Token _scan_number 0 Token ▼ Scan a number literal (integer or float). Note: Special handling for range lit… def _scan_number(self) -&gt; Token Scan a number literal (integer or float). Note: Special handling for range literals (1..10, 1...11). If we see digits followed by '..' or '...', treat the digits as an integer, not a float, so the range operator can be parsed. Returns Token _scan_name 0 Token ▼ Scan a name or keyword. def _scan_name(self) -&gt; Token Returns Token _find_next_construct 0 tuple[str, int] | None ▼ Find the next template construct ({{ }}, {% %}, or {# #}). Uses a single compi… def _find_next_construct(self) -&gt; tuple[str, int] | None Find the next template construct ({{ }}, {% %}, or {# #}). Uses a single compiled regex search instead of 3x str.find() calls. The regex is cached per LexerConfig for O(1) subsequent lookups. Performance: 5-24x faster than the previous str.find() approach (validated in benchmarks/test_benchmark_lexer.py). Returns tuple[str, int] | None _emit_delimiter 2 Token ▼ Emit a delimiter token and advance position. def _emit_delimiter(self, delimiter: str, token_type: TokenType) -&gt; Token Parameters Name Type Description delimiter — token_type — Returns Token _skip_whitespace 0 ▼ Skip whitespace characters. def _skip_whitespace(self) -&gt; None _advance 1 ▼ Advance position by count characters, tracking line/column. Optimized to use b… def _advance(self, count: int) -&gt; None Advance position by count characters, tracking line/column. Optimized to use batch processing with count() for newline detection instead of character-by-character iteration. Provides ~15-20% speedup for templates with long DATA nodes. Parameters Name Type Description count — Functions tokenize 2 list[Token] ▼ Convenience function to tokenize source into a list. def tokenize(source: str, config: LexerConfig | None = None) -&gt; list[Token] Parameters Name Type Description source str Template source code config LexerConfig | None Optional lexer configuration Default: None Returns list[Token] ← Previous Kida API Reference Next → nodes List &copy; 2026 Kida built in ᓚᘏᗢ { "linkPreviews": { "enabled": true, "hoverDelay": 200, "hideDelay": 150, "showSection": true, "showReadingTime": true, "showWordCount": true, "showDate": true, "showTags": true, "maxTags": 3, "includeSelectors": [".prose"], "excludeSelectors": ["nav", ".toc", ".breadcrumb", ".pagination", ".card", "[class*='-card']", ".tab-nav", "[class*='-widget']", ".child-items", ".content-tiles"], "allowedHosts": [], "allowedSchemes": ["https"], "hostFailureThreshold": 3 } } window.BENGAL_LAZY_ASSETS = { tabulator: '/kida/assets/js/tabulator.min.js', dataTable: '/kida/assets/js/data-table.js', mermaidToolbar: '/kida/assets/js/mermaid-toolbar.9de5abba.js', mermaidTheme: '/kida/assets/js/mermaid-theme.344822c5.js', graphMinimap: '/kida/assets/js/graph-minimap.ff04e939.js', graphContextual: '/kida/assets/js/graph-contextual.355458ba.js' }; window.BENGAL_ICONS = { close: '/kida/assets/icons/close.911d4fe1.svg', enlarge: '/kida/assets/icons/enlarge.652035e5.svg', copy: '/kida/assets/icons/copy.3d56e945.svg', 'download-svg': '/kida/assets/icons/download.04f07e1b.svg', 'download-png': '/kida/assets/icons/image.c34dfd40.svg', 'zoom-in': '/kida/assets/icons/zoom-in.237b4a83.svg', 'zoom-out': '/kida/assets/icons/zoom-out.38857c77.svg', reset: '/kida/assets/icons/reset.d26dba29.svg' }; Arrow Up

--------------------------------------------------------------------------------

Metadata:
- Word Count: 1632
- Reading Time: 8 minutes