# Thread Safety URL: /docs/about/thread-safety/ Section: about Tags: threading, safety -------------------------------------------------------------------------------- # Thread Safety Rosettes is thread-safe by design, with explicit support for Python's free-threading mode (PEP 703, available in 3.13t+). ## Thread-Safe Guarantees All public APIs are safe for concurrent use: | Component | Thread Safety Mechanism | |-----------|------------------------| | `highlight()` | Uses only local variables | | `tokenize()` | Uses only local variables | | `highlight_many()` | Thread pool with isolated workers | | `Token` | Immutable `NamedTuple` | | `get_lexer()` | `functools.cache` memoization | --- ## How It Works ### 1. Immutable Tokens The `Token` type is a `NamedTuple`, which is immutable: ```python class Token(NamedTuple): type: TokenType value: str line: int = 1 column: int = 1 ``` Tokens cannot be modified after creation, eliminating data races. ### 2. Local-Only Lexer State Lexers use only local variables during tokenization: ```python def tokenize(self, code: str) -> Iterator[Token]: # All state is local state = State.INITIAL pos = 0 buffer = [] while pos < len(code): # Process character ... ``` No instance variables or global state are modified during tokenization. ### 3. Cached Registry The lexer registry uses a two-layer design with `functools.cache` for thread-safe memoization: ```python def get_lexer(name: str) -> StateMachineLexer: """Public API - normalizes name, delegates to cached loader.""" canonical = _normalize_name(name) return _get_lexer_by_canonical(canonical) @cache def _get_lexer_by_canonical(canonical: str) -> StateMachineLexer: """Internal cached loader - lazily imports and instantiates.""" spec = _LEXER_SPECS[canonical] module = import_module(spec.module) return getattr(module, spec.class_name)() ``` This provides thread-safe memoization—the same lexer instance is returned for the same name across all threads. Lexers are loaded lazily on first access. ### 4. Immutable Configuration All configuration classes are frozen dataclasses with slots for memory efficiency: ```python @dataclass(frozen=True, slots=True) class FormatConfig: css_class: str = "highlight" wrap_code: bool = True class_prefix: str = "" data_language: str | None = None ``` --- ## Free-Threading Support (PEP 703) Rosettes declares itself safe for free-threaded Python via the `_Py_mod_gil` attribute: ```python def __getattr__(name: str) -> object: if name == "_Py_mod_gil": return 0 # Py_MOD_GIL_NOT_USED raise AttributeError(f"module 'rosettes' has no attribute {name!r}") ``` This tells free-threaded Python (3.13t+) that Rosettes: - Does not require the GIL - Can run with true parallelism - Is safe for concurrent access without locks --- ## Concurrent Usage Patterns ### Safe: Multiple Threads Highlighting ```python from concurrent.futures import ThreadPoolExecutor from rosettes import highlight def highlight_page(content: str) -> str: # Extract and highlight all code blocks return highlight(content, "python") with ThreadPoolExecutor(max_workers=4) as executor: pages = ["code1", "code2", "code3", "code4"] results = list(executor.map(highlight_page, pages)) ``` ### Safe: Shared Lexer Instance ```python from rosettes import get_lexer # Same instance returned (cached) lexer = get_lexer("python") # Safe to use from multiple threads def process(code: str) -> list: return list(lexer.tokenize(code)) ``` ### Safe: highlight_many() ```python from rosettes import highlight_many # Designed for parallel execution blocks = [(code, lang) for code, lang in code_blocks] results = highlight_many(blocks) # Thread pool internally ``` --- ## What NOT to Do ### Don't: Modify Tokens ```python # Tokens are immutable - this fails token = Token(TokenType.KEYWORD, "def") token.value = "class" # ❌ AttributeError ``` ### Don't: Rely on Global State ```python # Don't do this - Rosettes has no global mutable state import rosettes rosettes.SOME_SETTING = True # ❌ No effect, not supported ``` --- ## Performance on Free-Threaded Python On free-threaded Python (3.13t+), `highlight_many()` provides true parallelism: | Scenario | GIL Python | Free-Threading | Speedup | |----------|------------|----------------|---------| | 10 blocks | 15ms | 12ms | 1.25x | | 50 blocks | 75ms | 42ms | 1.78x | | 100 blocks | 150ms | 78ms | 1.92x | *Numbers are illustrative. Actual performance varies by hardware, code complexity, and Python version. See [[docs/about/performance|Performance]] for benchmarking details.* The speedup comes from true parallel execution without GIL contention. --- ## Verifying Free-Threading Check if you're running free-threaded Python: ```python import sys if hasattr(sys, "_is_gil_enabled"): if sys._is_gil_enabled(): print("GIL is enabled") else: print("Free-threading active!") else: print("Python < 3.13 (always has GIL)") ``` --- ## Next Steps - [[docs/highlighting/parallel|Parallel Processing]] — Using `highlight_many()` - [[docs/about/performance|Performance]] — Benchmarks and optimization -------------------------------------------------------------------------------- Metadata: - Word Count: 661 - Reading Time: 3 minutes