Thread Safety

Thread-safe design and free-threading support

3 min read 661 words

Rosettes is thread-safe by design, with explicit support for Python's free-threading mode (PEP 703, available in 3.13t+).

Thread-Safe Guarantees

All public APIs are safe for concurrent use:

Component Thread Safety Mechanism
highlight() Uses only local variables
tokenize() Uses only local variables
highlight_many() Thread pool with isolated workers
Token ImmutableNamedTuple
get_lexer() functools.cachememoization

How It Works

1. Immutable Tokens

TheTokentype is aNamedTuple, which is immutable:

class Token(NamedTuple):
    type: TokenType
    value: str
    line: int = 1
    column: int = 1

Tokens cannot be modified after creation, eliminating data races.

2. Local-Only Lexer State

Lexers use only local variables during tokenization:

def tokenize(self, code: str) -> Iterator[Token]:
    # All state is local
    state = State.INITIAL
    pos = 0
    buffer = []
    
    while pos < len(code):
        # Process character
        ...

No instance variables or global state are modified during tokenization.

3. Cached Registry

The lexer registry uses a two-layer design withfunctools.cachefor thread-safe memoization:

def get_lexer(name: str) -> StateMachineLexer:
    """Public API - normalizes name, delegates to cached loader."""
    canonical = _normalize_name(name)
    return _get_lexer_by_canonical(canonical)

@cache
def _get_lexer_by_canonical(canonical: str) -> StateMachineLexer:
    """Internal cached loader - lazily imports and instantiates."""
    spec = _LEXER_SPECS[canonical]
    module = import_module(spec.module)
    return getattr(module, spec.class_name)()

This provides thread-safe memoization—the same lexer instance is returned for the same name across all threads. Lexers are loaded lazily on first access.

4. Immutable Configuration

All configuration classes are frozen dataclasses with slots for memory efficiency:

@dataclass(frozen=True, slots=True)
class FormatConfig:
    css_class: str = "highlight"
    wrap_code: bool = True
    class_prefix: str = ""
    data_language: str | None = None

Free-Threading Support (PEP 703)

Rosettes declares itself safe for free-threaded Python via the_Py_mod_gilattribute:

def __getattr__(name: str) -> object:
    if name == "_Py_mod_gil":
        return 0  # Py_MOD_GIL_NOT_USED
    raise AttributeError(f"module 'rosettes' has no attribute {name!r}")

This tells free-threaded Python (3.13t+) that Rosettes:

  • Does not require the GIL
  • Can run with true parallelism
  • Is safe for concurrent access without locks

Concurrent Usage Patterns

Safe: Multiple Threads Highlighting

from concurrent.futures import ThreadPoolExecutor
from rosettes import highlight

def highlight_page(content: str) -> str:
    # Extract and highlight all code blocks
    return highlight(content, "python")

with ThreadPoolExecutor(max_workers=4) as executor:
    pages = ["code1", "code2", "code3", "code4"]
    results = list(executor.map(highlight_page, pages))

Safe: Shared Lexer Instance

from rosettes import get_lexer

# Same instance returned (cached)
lexer = get_lexer("python")

# Safe to use from multiple threads
def process(code: str) -> list:
    return list(lexer.tokenize(code))

Safe: highlight_many()

from rosettes import highlight_many

# Designed for parallel execution
blocks = [(code, lang) for code, lang in code_blocks]
results = highlight_many(blocks)  # Thread pool internally

What NOT to Do

Don't: Modify Tokens

# Tokens are immutable - this fails
token = Token(TokenType.KEYWORD, "def")
token.value = "class"  # ❌ AttributeError

Don't: Rely on Global State

# Don't do this - Rosettes has no global mutable state
import rosettes
rosettes.SOME_SETTING = True  # ❌ No effect, not supported

Performance on Free-Threaded Python

On free-threaded Python (3.13t+),highlight_many()provides true parallelism:

Scenario GIL Python Free-Threading Speedup
10 blocks 15ms 12ms 1.25x
50 blocks 75ms 42ms 1.78x
100 blocks 150ms 78ms 1.92x

Numbers are illustrative. Actual performance varies by hardware, code complexity, and Python version. See Performance for benchmarking details.

The speedup comes from true parallel execution without GIL contention.


Verifying Free-Threading

Check if you're running free-threaded Python:

import sys

if hasattr(sys, "_is_gil_enabled"):
    if sys._is_gil_enabled():
        print("GIL is enabled")
    else:
        print("Free-threading active!")
else:
    print("Python < 3.13 (always has GIL)")

Next Steps