Basic Usage

Using highlight() and tokenize() for syntax highlighting

2 min read 482 words

The two primary functions for syntax highlighting.

highlight()

Generate HTML with syntax-highlighted code.

from rosettes import highlight

html = highlight("def hello(): pass", "python")

# Use terminal output
ansi = highlight("def hello(): pass", "python", formatter="terminal")

Parameters

Parameter Type Default Description
code str required Source code to highlight
language str required Language name or alias
formatter str | Formatter "html" Formatter name or instance
hl_lines set[int] None 1-based line numbers to highlight
show_linenos bool False Include line numbers
css_class str None Container CSS class (HTML only)
css_class_style str "semantic" "semantic"or"pygments"(HTML only)

Language Aliases

Languages accept multiple aliases:

# These are equivalent
highlight(code, "python")
highlight(code, "py")
highlight(code, "python3")

# JavaScript aliases
highlight(code, "javascript")
highlight(code, "js")

CSS Class Styles

Semantic (default) — readable class names:

html = highlight(code, "python")  # css_class_style="semantic"
# <span class="syntax-keyword">def</span>
# <span class="syntax-function">hello</span>

Pygments — compatible with Pygments themes:

html = highlight(code, "python", css_class_style="pygments")
# <span class="k">def</span>
# <span class="nf">hello</span>

Container Class

The output is wrapped in a container<div>:

# Default: "rosettes" for semantic, "highlight" for pygments
html = highlight(code, "python")
# <div class="rosettes" data-language="python">...

html = highlight(code, "python", css_class_style="pygments")
# <div class="highlight" data-language="python">...

# Custom class
html = highlight(code, "python", css_class="my-code")
# <div class="my-code" data-language="python">...

tokenize()

Get raw tokens without formatting. Useful for custom output formats or analysis.

from rosettes import tokenize

tokens = tokenize("x = 42", "python")
for token in tokens:
    print(f"{token.type.name}: {token.value!r}")

Output:

NAME: 'x'
WHITESPACE: ' '
OPERATOR: '='
WHITESPACE: ' '
NUMBER_INTEGER: '42'

Token Structure

Each token is aNamedTuplewith:

Attribute Type Description
type TokenType Semantic token type
value str The actual text
line int 1-based line number
column int 1-based column number
token = tokens[0]
print(token.type)    # TokenType.NAME
print(token.value)   # 'x'
print(token.line)    # 1
print(token.column)  # 1

Use Cases

  • Custom formatters: Build terminal, LaTeX, or other output formats
  • Analysis: Count tokens, find patterns, compute metrics
  • Testing: Verify lexer behavior
  • Transformations: Modify code based on token structure

Error Handling

Both functions raiseLookupErrorfor unsupported languages:

from rosettes import highlight, supports_language

# Check before highlighting
if supports_language("python"):
    html = highlight(code, "python")

# Or handle the exception
try:
    html = highlight(code, "unknown")
except LookupError as e:
    print(f"Unsupported language: {e}")

Next Steps