Classes
Lexer
6
▼
Protocol for tokenizers.
Implementations must be thread-safe — no mutable shared state.
The tokeni…
Lexer
6
▼
Protocol for tokenizers.
Implementations must be thread-safe — no mutable shared state. The tokenize method should only use local variables.
Thread-Safety Contract:
- tokenize() must use only local variables
- No instance state mutation during tokenization
- Class-level constants (KEYWORDS, etc.) must be immutable (frozenset)
Performance Contract:
- O(n) time complexity guaranteed (no backtracking)
- Single pass through input (no lookahead beyond current position)
- Streaming output (yield tokens as found)
Methods
name
0
str
▼
The canonical name of this lexer (e.g., 'python').
property
name
0
str
▼
def name(self) -> str
Returns
str
aliases
0
tuple[str, ...]
▼
Alternative names for this lexer (e.g., ('py', 'python3')).
property
aliases
0
tuple[str, ...]
▼
def aliases(self) -> tuple[str, ...]
Returns
tuple[str, ...]
filenames
0
tuple[str, ...]
▼
Glob patterns for files this lexer handles (e.g., ('*.py',)).
property
filenames
0
tuple[str, ...]
▼
def filenames(self) -> tuple[str, ...]
Returns
tuple[str, ...]
mimetypes
0
tuple[str, ...]
▼
MIME types this lexer handles.
property
mimetypes
0
tuple[str, ...]
▼
def mimetypes(self) -> tuple[str, ...]
Returns
tuple[str, ...]
tokenize
4
Iterator[Token]
▼
Tokenize source code into a stream of tokens.
tokenize
4
Iterator[Token]
▼
def tokenize(self, code: str, config: LexerConfig | None = None, start: int = 0, end: int | None = None) -> Iterator[Token]
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
The source code to tokenize. |
config |
— |
Optional lexer configuration. Default:None
|
start |
— |
Starting index in the source string. Default:0
|
end |
— |
Optional ending index in the source string. Default:None
|
Returns
Iterator[Token]
tokenize_fast
3
Iterator[tuple[TokenType…
▼
Fast tokenization without position tracking.
Yields minimal (type, value) tupl…
tokenize_fast
3
Iterator[tuple[TokenType…
▼
def tokenize_fast(self, code: str, start: int = 0, end: int | None = None) -> Iterator[tuple[TokenType, str]]
Fast tokenization without position tracking.
Yields minimal (type, value) tuples for maximum speed. Use when line/column info is not needed.
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
The source code to tokenize. |
start |
— |
Starting index in the source string. Default:0
|
end |
— |
Optional ending index in the source string. Default:None
|
Returns
Iterator[tuple[TokenType, str]]
Formatter
5
▼
Protocol for output formatters.
Implementations must be thread-safe — use only local variables in …
Formatter
5
▼
Protocol for output formatters.
Implementations must be thread-safe — use only local variables in format(). Formatter instances should be immutable (frozen dataclasses recommended).
Thread-Safety Contract:
- format() must use only local variables
- Instance state must be immutable after construction
- No side effects (file I/O, network, etc.)
Streaming Contract:
- format() yields chunks as they're ready (generator)
- format_string() convenience method joins chunks
- Callers can start processing before full output is ready
Fast Path:
- format_fast() accepts (TokenType, value) tuples instead of Token objects
- Avoids Token construction overhead when position info not needed
- ~20% faster for simple highlighting without line numbers
Methods
name
0
str
▼
The canonical name of this formatter (e.g., 'html').
property
name
0
str
▼
def name(self) -> str
Returns
str
format
2
Iterator[str]
▼
Format tokens into output chunks.
format
2
Iterator[str]
▼
def format(self, tokens: Iterator[Token], config: FormatConfig | None = None) -> Iterator[str]
Parameters
| Name | Type | Description |
|---|---|---|
tokens |
— |
Stream of tokens to format. |
config |
— |
Optional formatter configuration. Default:None
|
Returns
Iterator[str]
format_fast
2
Iterator[str]
▼
Fast formatting without position tracking.
format_fast
2
Iterator[str]
▼
def format_fast(self, tokens: Iterator[tuple[TokenType, str]], config: FormatConfig | None = None) -> Iterator[str]
Parameters
| Name | Type | Description |
|---|---|---|
tokens |
— |
Stream of (TokenType, value) tuples. |
config |
— |
Optional formatter configuration. Default:None
|
Returns
Iterator[str]
format_string
2
str
▼
Format tokens and return as a single string.
Convenience method that joins for…
format_string
2
str
▼
def format_string(self, tokens: Iterator[Token], config: FormatConfig | None = None) -> str
Format tokens and return as a single string.
Convenience method that joins format() output.
Parameters
| Name | Type | Description |
|---|---|---|
tokens |
— |
Stream of tokens to format. |
config |
— |
Optional formatter configuration. Default:None
|
Returns
str
Complete formatted string.
format_string_fast
2
str
▼
Fast format and return as a single string.
Convenience method that joins forma…
format_string_fast
2
str
▼
def format_string_fast(self, tokens: Iterator[tuple[TokenType, str]], config: FormatConfig | None = None) -> str
Fast format and return as a single string.
Convenience method that joins format_fast() output.
Parameters
| Name | Type | Description |
|---|---|---|
tokens |
— |
Stream of (TokenType, value) tuples. |
config |
— |
Optional formatter configuration. Default:None
|
Returns
str
Complete formatted string.