Classes
PythonStateMachineLexer
13
▼
Hand-written Python 3 lexer.
O(n) guaranteed, zero regex, thread-safe.
Handles all Python 3.x synt…
PythonStateMachineLexer
13
▼
Hand-written Python 3 lexer.
O(n) guaranteed, zero regex, thread-safe. Handles all Python 3.x syntax including f-strings, type hints, walrus operator.
This is the reference implementation for Rosettes lexers. Use it as a template when adding new language support.
Performance: ~50µs per 100-line file, ~500 tokens/ms throughput.
Attributes
| Name | Type | Description |
|---|---|---|
name |
— |
Canonical language name ("python") |
aliases |
— |
Alternative names for registry lookup ("py", "python3", "py3") |
filenames |
— |
Glob patterns for file detection (".py", ".pyw", "*.pyi") |
mimetypes |
— |
MIME types ("text/x-python", "application/x-python") Thread-Safety: All class attributes are frozen (frozenset). The tokenize() method uses only local variables for state (pos, line, col). |
Methods
tokenize
2
Iterator[Token]
▼
Tokenize Python source code.
Single-pass, character-by-character. O(n) guarant…
tokenize
2
Iterator[Token]
▼
def tokenize(self, code: str, config: LexerConfig | None = None) -> Iterator[Token]
Tokenize Python source code.
Single-pass, character-by-character. O(n) guaranteed.
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
|
config |
— |
Default:None
|
Returns
Iterator[Token]
Internal Methods 8 ▼
_scan_string_literal
2
tuple[TokenType, int, in…
▼
Scan a string literal with optional prefix.
Returns (token_type, end_position,…
_scan_string_literal
2
tuple[TokenType, int, in…
▼
def _scan_string_literal(self, code: str, pos: int) -> tuple[TokenType, int, int]
Scan a string literal with optional prefix.
Returns (token_type, end_position, newline_count).
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
|
pos |
— |
Returns
tuple[TokenType, int, int]
_scan_number
2
tuple[TokenType, int]
▼
Scan a numeric literal.
Returns (token_type, end_position).
_scan_number
2
tuple[TokenType, int]
▼
def _scan_number(self, code: str, pos: int) -> tuple[TokenType, int]
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
|
pos |
— |
Returns
tuple[TokenType, int]
_scan_digits_with_underscore
2
int
▼
Scan digits with optional underscores.
_scan_digits_with_underscore
2
int
▼
def _scan_digits_with_underscore(self, code: str, pos: int) -> int
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
|
pos |
— |
Returns
int
_scan_hex_digits
2
int
▼
Scan hex digits with optional underscores.
_scan_hex_digits
2
int
▼
def _scan_hex_digits(self, code: str, pos: int) -> int
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
|
pos |
— |
Returns
int
_scan_octal_digits
2
int
▼
Scan octal digits with optional underscores.
_scan_octal_digits
2
int
▼
def _scan_octal_digits(self, code: str, pos: int) -> int
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
|
pos |
— |
Returns
int
_scan_binary_digits
2
int
▼
Scan binary digits with optional underscores.
_scan_binary_digits
2
int
▼
def _scan_binary_digits(self, code: str, pos: int) -> int
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
|
pos |
— |
Returns
int
_scan_exponent
2
int
▼
Scan optional exponent part of number.
_scan_exponent
2
int
▼
def _scan_exponent(self, code: str, pos: int) -> int
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
|
pos |
— |
Returns
int
_classify_word
1
TokenType
▼
Classify an identifier into the appropriate token type.
_classify_word
1
TokenType
▼
def _classify_word(self, word: str) -> TokenType
Parameters
| Name | Type | Description |
|---|---|---|
word |
— |
Returns
TokenType