Module

_types

Core types for Rosettes syntax highlighting.

Thread-safe, immutable types for tokenization.

Design Philosophy:

Types in this module are designed for maximum performance and safety:

1. **Immutable**: Token is a NamedTuple, TokenType is a StrEnum
2. **Minimal memory**: Token is ~64 bytes (vs ~200 for a regular object)
3. **Hashable**: Tokens can be used in sets/dicts for deduplication
4. **Thread-safe**: Immutability means no synchronization needed

What Goes in TokenType:

✅ DO include:
  • Language keywords (KEYWORD, KEYWORD_DECLARATION, etc.)

  • Operators and punctuation (OPERATOR, PUNCTUATION)

  • Literals (STRING, NUMBER, etc.)

  • Comments (COMMENT, COMMENT_MULTILINE, etc.)

  • Names (NAME, NAME_FUNCTION, NAME_CLASS, etc.)

    ❌ DON'T include:

  • Formatting hints (indentation level, line breaks)

  • Editor-specific tokens (folding markers, etc.)

  • Language-specific tokens (use generic categories)

Pygments Compatibility:

TokenType values are the CSS class suffixes used by Pygments themes.
This means existing Pygments stylesheets work with Rosettes output:
  • TokenType.KEYWORD = "k" → def

  • TokenType.NAME_FUNCTION = "nf" → my_func

    Use css_class_style="pygments" in highlight() for this compatibility.

Use css_class_style="semantic" for readable classes like .syntax-function.

See Also:

rosettes.themes._roles: Higher-level semantic roles for theming
rosettes.themes._mapping: TokenType → SyntaxRole mapping
rosettes.formatters.html: How TokenTypes become CSS classes

Classes

TokenType 0
Semantic token types with Pygments-compatible CSS class names. Each value is the CSS class suffix …

Semantic token types with Pygments-compatible CSS class names.

Each value is the CSS class suffix used by Pygments themes. This ensures drop-in compatibility with existing Pygments stylesheets.

Categories:

Keywords: KEYWORD, KEYWORD_CONSTANT, KEYWORD_DECLARATION, etc.
Names: NAME, NAME_FUNCTION, NAME_CLASS, NAME_BUILTIN, etc.
Literals: STRING, NUMBER, NUMBER_FLOAT, etc.
Operators: OPERATOR, OPERATOR_WORD
Punctuation: PUNCTUATION, PUNCTUATION_MARKER
Comments: COMMENT, COMMENT_SINGLE, COMMENT_MULTILINE, etc.
Generic: TEXT, WHITESPACE, ERROR (for diffs, errors, etc.)

Usage:

>>> from rosettes import TokenType
>>> TokenType.KEYWORD
<TokenType.KEYWORD: 'k'>
>>> TokenType.KEYWORD.value  # CSS class suffix
'k'
Token 4
Immutable token — thread-safe, minimal memory. A Token represents a single lexical unit from sourc…

Immutable token — thread-safe, minimal memory.

A Token represents a single lexical unit from source code. Tokens are immutable NamedTuples for thread-safety and memory efficiency.

Memory: Each Token uses ~64 bytes (NamedTuple overhead + references). A typical 100-line Python file produces ~500 tokens (~32KB).

Thread-Safety: Tokens are immutable and can be safely shared across threads. No defensive copying needed when passing tokens between workers.

Fast Path: When position info is not needed, use tokenize_fast() which yields (TokenType, str) tuples instead of Token objects for ~20% speedup.

Attributes

Name Type Description
type TokenType

The semantic type of the token (e.g., TokenType.KEYWORD).

value str

The actual text content of the token (e.g., "def").

line int

1-based line number where token starts.

column int

1-based column number where token starts.