Classes
_Chunk
3
▼
A chunk of source code with position metadata.
_Chunk
3
▼
A chunk of source code with position metadata.
Attributes
| Name | Type | Description |
|---|---|---|
text |
str
|
— |
start_offset |
int
|
— |
start_line |
int
|
— |
Functions
is_free_threaded
0
bool
▼
Check if running on free-threaded Python (3.14t+).
is_free_threaded
0
bool
▼
def is_free_threaded() -> bool
Returns
bool
_find_safe_splits
2
list[int]
▼
Find safe split points (newlines) for parallel tokenization.
We split at newli…
_find_safe_splits
2
list[int]
▼
def _find_safe_splits(code: str, target_chunk_size: int) -> list[int]
Find safe split points (newlines) for parallel tokenization.
We split at newlines to avoid splitting in the middle of tokens. This is a heuristic that works for most languages.
Parameters
| Name | Type | Description |
|---|---|---|
code |
str |
Source code to split. |
target_chunk_size |
int |
Target size for each chunk. |
Returns
list[int]
_make_chunks
2
list[_Chunk]
▼
Split code into chunks at the given positions.
_make_chunks
2
list[_Chunk]
▼
def _make_chunks(code: str, splits: list[int]) -> list[_Chunk]
Parameters
| Name | Type | Description |
|---|---|---|
code |
str |
Source code to split. |
splits |
list[int] |
List of positions to split at. |
Returns
list[_Chunk]
tokenize_parallel
2
Iterator[Token]
▼
Parallel tokenization for large files.
Only beneficial on free-threaded Python…
tokenize_parallel
2
Iterator[Token]
▼
def tokenize_parallel(lexer: StateMachineLexer, code: str) -> Iterator[Token]
Parallel tokenization for large files.
Only beneficial on free-threaded Python (3.14t+). Falls back to sequential on GIL Python.
Parameters
| Name | Type | Description |
|---|---|---|
lexer |
StateMachineLexer |
The lexer to use. |
code |
str |
Source code to tokenize. |
Returns
Iterator[Token]