Parallel Processing

Highlight multiple code blocks concurrently with highlight_many()

2 min read 475 words

For sites with many code blocks,highlight_many()provides concurrent processing with 1.5-2x speedup on Python 3.14t.

When to Use

Scenario Recommendation
< 8 blocks Usehighlight()in a loop
8+ blocks Usehighlight_many()
50+ blocks on 3.14t Significant speedup

The overhead of thread management makeshighlight_many()slower for small batches. Rosettes automatically falls back to sequential processing for < 8 blocks.


highlight_many()

Highlight multiple code blocks in parallel.

from rosettes import highlight_many

blocks = [
    ("def foo(): pass", "python"),
    ("const x = 1;", "javascript"),
    ("fn main() {}", "rust"),
    ('{"key": "value"}', "json"),
]

results = highlight_many(blocks)
# Returns list of HTML strings in same order as input

Parameters

Parameter Type Default Description
items Iterable[tuple[str, str]] required (code, language) tuples
max_workers int min(4, cpu_count) Thread count
css_class_style str "semantic" "semantic"or"pygments"

Worker Count

The default of 4 workers is optimal based on benchmarking:

# Default: 4 workers (optimal)
results = highlight_many(blocks)

# Custom worker count
results = highlight_many(blocks, max_workers=8)

Note

More workers doesn't always mean faster. Thread overhead and memory contention can reduce performance beyond 4-8 workers.


tokenize_many()

Parallel tokenization for raw token access.

from rosettes import tokenize_many

blocks = [
    ("x = 1", "python"),
    ("let y = 2;", "javascript"),
]

results = tokenize_many(blocks)
# Returns list of token lists

for i, tokens in enumerate(results):
    print(f"Block {i}: {len(tokens)} tokens")

Free-Threading Performance

On Python 3.14t with free-threading enabled (PEP 703),highlight_many()provides true parallelism:

Blocks GIL Python Free-Threading Speedup
10 15ms 12ms 1.25x
50 75ms 42ms 1.78x
100 150ms 78ms 1.92x

Why It Works

Rosettes is thread-safe by design:

  1. Immutable tokens:Tokenis aNamedTuple
  2. Local-only state: Lexers use only local variables during tokenization
  3. No shared mutable data: No global state to contend for
  4. PEP 703 declaration: Module declares itself safe for free-threading

Example: Static Site Generator

from rosettes import highlight_many
from pathlib import Path

def highlight_all_code_blocks(pages: list[dict]) -> list[dict]:
    """Highlight all code blocks across all pages."""
    
    # Collect all code blocks
    blocks = []
    block_locations = []  # Track which page/block each belongs to
    
    for page_idx, page in enumerate(pages):
        for block_idx, block in enumerate(page["code_blocks"]):
            blocks.append((block["code"], block["language"]))
            block_locations.append((page_idx, block_idx))
    
    # Highlight in parallel
    results = highlight_many(blocks)
    
    # Assign results back to pages
    for (page_idx, block_idx), html in zip(block_locations, results):
        pages[page_idx]["code_blocks"][block_idx]["html"] = html
    
    return pages

Next Steps