Rosettes is probably the simplest threading story in the stack.
Every lexer in Rosettes is thread-safe by default, not because of locks, but because its mutable state lives in local variables.
That is the whole model. If the hot path uses only local variables, thread-safety is almost free: no locks, no ContextVar, no copy-on-write. A syntax highlighter is small enough to get this right all the way through.
Series context
Part 4 of 6 — Free-Threading in the Bengal Ecosystem. Rosettes is the syntax highlighting layer — used by Patitas for code blocks and Bengal for build-time highlighting.
Run it
uv python install 3.14t
uv run --python=3.14t python -c "
from rosettes import highlight_many
blocks = [
('def foo(): pass', 'python'),
('const x = 1;', 'javascript'),
('fn main() {}', 'rust'),
] * 20
results = highlight_many(blocks)
print(f'Highlighted {len(results)} blocks')
"
For 8+ blocks, highlight_many() uses ThreadPoolExecutor. On Python 3.14t, that becomes real parallelism without the caller having to learn a different API.
Performance
Rosettes uses 8 blocks as the threshold for switching to parallel. Below that, thread overhead dominates, so the library stays simple and cheap.
Local variables only
The key design rule is simple: lexer state lives in local variables, never in self.
def tokenize(self, code):
self.pos = 0 # Shared across threads
self.line = 1 # Race condition
self.line_start = 0 # Data corruption
while self.pos < len(code):
...
def tokenize(self, code, start=0, end=None):
pos = start # Local to this call
length = end or len(code)
line = 1
line_start = start
while pos < length:
char = code[pos]
col = pos - line_start + 1
...
pos, line, and line_start are all local. Multiple threads can call tokenize() on the same lexer instance concurrently because the instance is effectively stateless during tokenization.
Frozen lookup tables
Keyword and character-set lookups use frozenset:
_KEYWORDS: frozenset[str] = frozenset(
{"def", "class", "if", "else", "return", "match", "case", "type", ...}
)
DIGITS: frozenset[str] = frozenset("0123456789")
IDENT_START: frozenset[str] = frozenset(
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_"
)
That gives O(1) membership checks with immutable data structures that are safe to share across threads without protection. Nobody can mutate them at runtime.
No regex — scan_while instead
Like Patitas, Rosettes avoids regex in the hot path. The core building block is scan_while:
def scan_while(code: str, pos: int, char_set: frozenset[str]) -> int:
"""Advance position while characters are in char_set."""
length = len(code)
while pos < length and code[pos] in char_set:
pos += 1
return pos
Single pass. No backtracking. O(n) guaranteed. Tests run pathological inputs, including nested parens and repeated escapes, with a 1-second timeout to verify linear scaling.
Warning
Syntax highlighters are an overlooked ReDoS vector. If your highlighter runs server-side on user-submitted code — documentation sites, paste services, code review tools — a regex-based lexer is an attack surface. Rosettes eliminates that by construction.
Immutable output
Tokens, config, and formatters are all frozen:
class Token(NamedTuple):
type: TokenType
value: str
line: int = 1
column: int = 1
@dataclass(frozen=True, slots=True)
class HtmlFormatter:
config: FormatConfig = field(default_factory=FormatConfig)
...
That means no defensive copying when passing tokens between workers. The formatter receives tokens, formats them, and returns a string. No shared mutable state at any layer.
What this means in practice
Rosettes is small: about 55 language lexers, pure Python, zero dependencies. Its threading model is the simplest in the stack. Local variables handle mutable state, frozenset handles lookup tables, and frozen structures handle config and output.
When the hot path is already stateless, thread-safety stops being a feature you add later and becomes the natural consequence of the design.
Further reading
- Rosettes documentation — language support and formatter reference
- Rosettes source
- Next in series: Pounce — Thread-Based ASGI Workers
Related
- The Python Free-Threading Ecosystem in 2026 — who's ready for NoGIL