Module

delegate

LexerDelegate implementation using rosettes.

Enables Zero-Copy Lexer Handoff (ZCLH) by bridging Patitas coordinate handoff to Rosettes state-machine lexers.

Design Philosophy:

Zero-Copy Lexer Handoff (ZCLH) is a performance pattern where:

  1. Coordinate Handoff: The markdown parser (Patitas) identifies fenced code blocks and records (start, end) positions
  2. Zero Copy: Instead of extracting substrings, we pass the entire source string with start/end indices
  3. Delegate Pattern: RosettesDelegate bridges the parser to the syntax highlighter without tight coupling

This eliminates string allocation for code content:

  • Traditional:code_block = source[start:end]→ allocates new string
  • ZCLH:tokenize(source, start=start, end=end)→ no allocation

Performance Impact:

For a 10KB markdown file with 50 code blocks:

  • Traditional extraction: ~5ms (50 allocations, GC pressure)
  • ZCLH: ~3ms (0 allocations for content)

Thread-Safety:

RosettesDelegate is stateless — all methods use only their arguments. Safe for concurrent use from multiple threads on Python 3.14t.

Integration:

This delegate is used by Patitas (the markdown parser) and Bengal (the static site generator) to highlight fenced code blocks.

Example:

>>> delegate = RosettesDelegate()
>>> source = "# Header\n```python\ndef foo(): pass\n```"
>>> # Parser identifies code block at positions 19-34
>>> if delegate.supports_language("python"):
...     tokens = list(delegate.tokenize_range(source, 19, 34, "python"))

See Also:

  • rosettes.get_lexer: Lexer lookup used internally
  • rosettes.lexers._state_machine: How start/end are handled

Classes

RosettesDelegate 2
LexerDelegate implementation using rosettes. Thread-safe: All state is local to method calls. Desi…

LexerDelegate implementation using rosettes.

Thread-safe: All state is local to method calls. Designed for Python 3.14t free-threading.

This class bridges markdown parsers (like Patitas) to Rosettes syntax highlighting, enabling zero-copy lexer handoff.

Methods

tokenize_range 4 Iterator[Token]
Tokenize a range of source code using rosettes state-machine lexer. This is th…
def tokenize_range(self, source: str, start: int, end: int, language: str) -> Iterator[Token]

Tokenize a range of source code using rosettes state-machine lexer.

This is the core ZCLH method: the source string is passed by reference, and only the (start, end) range is tokenized. No substring allocation.

Performance: O(end - start) guaranteed. Zero allocations for code content. The lexer reads characters directly from source[start:end].

Parameters
Name Type Description
source

The complete source string (not just the code block).

start

Starting index of the code block in source.

end

Ending index (exclusive) of the code block.

language

Language name or alias (e.g., 'python', 'js').

Returns
Iterator[Token]
supports_language 1 bool
Check if rosettes supports the given language. Use this before calling tokeniz…
def supports_language(self, language: str) -> bool

Check if rosettes supports the given language.

Use this before calling tokenize_range() to handle unsupported languages gracefully (e.g., fall back to plain text).

Parameters
Name Type Description
language

Language name or alias to check.

Returns
bool True if the language is supported, False otherwise.