Module

`_registry`

Lazy lexer registry for Rosettes.

All lexers are hand-written state machines with O(n) guaranteed performance and zero ReDoS vulnerability. Lexers are loaded on-demand using functools.cache for thread-safe memoization.

Design Philosophy:

The registry uses lazy loading with caching to balance startup time and runtime performance:

Zero startup cost: No lexers imported at module load time
O(1) lookup: Pre-computed alias table for instant name resolution
Single instance: functools.cache ensures one lexer per language
Thread-safe: cache is thread-safe; lexers are stateless

Architecture:

_LEXER_SPECS: Static registry mapping names to (module, class) specs
_ALIAS_TO_NAME: Pre-computed case-insensitive alias lookup table
_get_lexer_by_canonical: Cached lexer instantiation (one per language)

Performance Notes:

First call: ~1ms (module import + class instantiation)
Subsequent calls: ~100ns (dict lookup + cache hit)
Memory: ~500 bytes per loaded lexer

Common Mistakes:

# ❌ WRONG: Caching lexer instances yourself
lexer_cache = {}
if lang not in lexer_cache:
    lexer_cache[lang] = get_lexer(lang)

# ✅ CORRECT: Just call get_lexer() — it's already cached
lexer = get_lexer(lang)

# ❌ WRONG: Checking support by catching exceptions
try:
    lexer = get_lexer(lang)
except LookupError:
    lexer = None

# ✅ CORRECT: Use supports_language() for checks
if supports_language(lang):
    lexer = get_lexer(lang)

Adding New Languages:

To add a new language, create a lexer inrosettes/lexers/and add an entry to_LEXER_SPECS below. See rosettes/lexers/_state_machine.pyfor the base class and helper functions.

See Also:

rosettes.lexers._state_machine: Base class for lexer implementations
rosettes._protocol.Lexer: Protocol that all lexers must satisfy
rosettes._formatter_registry: Similar pattern for formatters

Functions

get_lexer 1 StateMachineLexer ▼

Get a lexer instance by name or alias. All lexers are hand-written state machi…

def get_lexer(name: str) -> StateMachineLexer

Get a lexer instance by name or alias.

All lexers are hand-written state machines with O(n) guaranteed performance and zero ReDoS vulnerability.

Uses functools.cache for thread-safe memoization. Lexers are loaded lazily on first access.

Parameters

Name	Type	Description
`name`	`str`	Language name or alias (e.g., 'python', 'py', 'js').

Returns

StateMachineLexer

list_languages 0 list[str] ▼

List all supported language names. O(1).

def list_languages() -> list[str]

Returns

list[str]

supports_language 1 bool ▼

Check if a language is supported.

def supports_language(name: str) -> bool

Parameters

Name	Type	Description
`name`	`str`	Language name or alias.

Returns

bool