Module

`_parallel`

Parallel tokenization for free-threaded Python (3.14t+).

Enables true parallel tokenization of large files by splitting at safe boundaries and processing chunks concurrently.

Design Philosophy:

This module exists for one purpose: maximum throughput on Python 3.14t.

On GIL Python (3.13 and earlier), threads cannot truly parallelize CPU-bound work. But on free-threaded Python 3.14t, Rosettes lexers use only local variables, enabling true parallel tokenization.

Architecture:

Split Detection: Find safe split points (newlines) to avoid cutting tokens in half
Chunking: Divide code into ~64KB chunks with position metadata
Parallel Execution: Tokenize chunks using ThreadPoolExecutor
Line Adjustment: Fix line numbers for chunks after the first
Ordered Merge: Yield tokens in original source order

When to Use:

✅ Large files (>128KB) on Python 3.14t
✅ Batch processing many files withhighlight_many()
❌ Small files (< 128KB) — sequential is faster (thread overhead)
❌ GIL Python — no parallelism benefit

Performance:

Sequential: ~50µs per 100-line file
Parallel (4 workers, 3.14t): ~15µs per file for batches of 100+

The crossover point is ~8 items or ~128KB of code.

Thread-Safety:

Safe by design:

Lexers use only local variables
Chunks are independent (no shared state)
Token lists are created per-chunk, then merged

Limitations:

Splitting at newlines may not be safe for all languages (e.g., heredocs spanning lines). This is rare in practice.
Memory: Holds all chunk results before yielding

See Also:

rosettes.highlight_many: High-level parallel API
rosettes.tokenize_many: Parallel tokenization without formatting

Functions

is_free_threaded 0 bool ▼

Check if running on free-threaded Python (3.14t+).

def is_free_threaded() -> bool

Returns

bool

tokenize_parallel 4 Iterator[Token] ▼

Parallel tokenization for large files. Only beneficial on free-threaded Python…

def tokenize_parallel(lexer: StateMachineLexer, code: str, *, chunk_size: int = 64000, max_workers: int | None = None) -> Iterator[Token]

Parallel tokenization for large files.

Only beneficial on free-threaded Python (3.14t+). Falls back to sequential on GIL Python.

Parameters

Name	Type	Description
`lexer`	`StateMachineLexer`	The lexer to use.
`code`	`str`	Source code to tokenize.
`chunk_size`	`int`	Target chunk size in characters. Default:`64000`
`max_workers`	`int \| None`	Maximum threads. None = CPU count. Default:`None`

Returns

Iterator[Token]