Module

_registry

Lazy lexer registry for Rosettes.

All lexers are hand-written state machines with O(n) guaranteed performance and zero ReDoS vulnerability. Lexers are loaded on-demand using functools.cache for thread-safe memoization.

Design Philosophy:

The registry uses lazy loading with caching to balance startup time and runtime performance:

  1. Zero startup cost: No lexers imported at module load time
  2. O(1) lookup: Pre-computed alias table for instant name resolution
  3. Single instance: functools.cache ensures one lexer per language
  4. Thread-safe: cache is thread-safe; lexers are stateless

Architecture:

  • _LEXER_SPECS: Static registry mapping names to (module, class) specs
  • _ALIAS_TO_NAME: Pre-computed case-insensitive alias lookup table
  • _get_lexer_by_canonical: Cached lexer instantiation (one per language)

Performance Notes:

  • First call: ~1ms (module import + class instantiation)
  • Subsequent calls: ~100ns (dict lookup + cache hit)
  • Memory: ~500 bytes per loaded lexer

Common Mistakes:

# ❌ WRONG: Caching lexer instances yourself
lexer_cache = {}
if lang not in lexer_cache:
    lexer_cache[lang] = get_lexer(lang)

# ✅ CORRECT: Just call get_lexer() — it's already cached
lexer = get_lexer(lang)

# ❌ WRONG: Checking support by catching exceptions
try:
    lexer = get_lexer(lang)
except LookupError:
    lexer = None

# ✅ CORRECT: Use supports_language() for checks
if supports_language(lang):
    lexer = get_lexer(lang)

Adding New Languages:

To add a new language, create a lexer inrosettes/lexers/and add an entry to_LEXER_SPECSbelow. Seerosettes/lexers/_state_machine.pyfor the base class and helper functions.

See Also:

  • rosettes.lexers._state_machine: Base class for lexer implementations
  • rosettes._protocol.Lexer: Protocol that all lexers must satisfy
  • rosettes._formatter_registry: Similar pattern for formatters

Classes

LexerSpec 3
Specification for lazy-loading a lexer. Used internally by the registry to defer module imports un…

Specification for lazy-loading a lexer.

Used internally by the registry to defer module imports until first use. This keepsimport rosettesfast (~5ms) even with 50+ language support.

Attributes

Name Type Description
module str

Full module path (e.g., 'rosettes.lexers.python_sm').

class_name str

Name of the lexer class in the module.

aliases tuple[str, ...]

Alternative names for lookup (e.g., 'py' for 'python').

Functions

_normalize_name 1 str
Normalize a language name to its canonical form. O(1) lookup.
def _normalize_name(name: str) -> str
Parameters
Name Type Description
name str

Language name or alias.

Returns
str
get_lexer 1 StateMachineLexer
Get a lexer instance by name or alias. All lexers are hand-written state machi…
def get_lexer(name: str) -> StateMachineLexer

Get a lexer instance by name or alias.

All lexers are hand-written state machines with O(n) guaranteed performance and zero ReDoS vulnerability.

Uses functools.cache for thread-safe memoization. Lexers are loaded lazily on first access.

Parameters
Name Type Description
name str

Language name or alias (e.g., 'python', 'py', 'js').

Returns
StateMachineLexer
_get_lexer_by_canonical 1 StateMachineLexer
Internal cached loader - keyed by canonical name.
def _get_lexer_by_canonical(canonical: str) -> StateMachineLexer
Parameters
Name Type Description
canonical str
Returns
StateMachineLexer
list_languages 0 list[str]
List all supported language names. O(1).
def list_languages() -> list[str]
Returns
list[str]
supports_language 1 bool
Check if a language is supported.
def supports_language(name: str) -> bool
Parameters
Name Type Description
name str

Language name or alias.

Returns
bool