# Parallel Processing

URL: /docs/highlighting/parallel/
Section: highlighting
Tags: parallel, performance

--------------------------------------------------------------------------------

# Parallel Processing

For sites with many code blocks, `highlight_many()` provides concurrent processing with 1.5-2x speedup on Python 3.14t.

## When to Use

| Scenario | Recommendation |
|----------|----------------|
| < 8 blocks | Use `highlight()` in a loop |
| 8+ blocks | Use `highlight_many()` |
| 50+ blocks on 3.14t | Significant speedup |

The overhead of thread management makes `highlight_many()` slower for small batches. Rosettes automatically falls back to sequential processing for < 8 blocks.

---

## `highlight_many()`

Highlight multiple code blocks in parallel.

```python
from rosettes import highlight_many

blocks = [
    ("def foo(): pass", "python"),
    ("const x = 1;", "javascript"),
    ("fn main() {}", "rust"),
    ('{"key": "value"}', "json"),
]

results = highlight_many(blocks)
# Returns list of HTML strings in same order as input
```

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `items` | `Iterable[tuple[str, str]]` | required | (code, language) tuples |
| `max_workers` | `int` | `min(4, cpu_count)` | Thread count |
| `css_class_style` | `str` | `"semantic"` | `"semantic"` or `"pygments"` |

### Worker Count

The default of 4 workers is optimal based on benchmarking:

```python
# Default: 4 workers (optimal)
results = highlight_many(blocks)

# Custom worker count
results = highlight_many(blocks, max_workers=8)
```

:::{note}
More workers doesn't always mean faster. Thread overhead and memory contention can reduce performance beyond 4-8 workers.
:::

---

## `tokenize_many()`

Parallel tokenization for raw token access.

```python
from rosettes import tokenize_many

blocks = [
    ("x = 1", "python"),
    ("let y = 2;", "javascript"),
]

results = tokenize_many(blocks)
# Returns list of token lists

for i, tokens in enumerate(results):
    print(f"Block {i}: {len(tokens)} tokens")
```

---

## Free-Threading Performance

On Python 3.14t with free-threading enabled (PEP 703), `highlight_many()` provides true parallelism:

| Blocks | GIL Python | Free-Threading | Speedup |
|--------|------------|----------------|---------|
| 10 | 15ms | 12ms | 1.25x |
| 50 | 75ms | 42ms | 1.78x |
| 100 | 150ms | 78ms | 1.92x |

### Why It Works

Rosettes is thread-safe by design:

1. **Immutable tokens**: `Token` is a `NamedTuple`
2. **Local-only state**: Lexers use only local variables during tokenization
3. **No shared mutable data**: No global state to contend for
4. **PEP 703 declaration**: Module declares itself safe for free-threading

---

## Example: Static Site Generator

```python
from rosettes import highlight_many
from pathlib import Path

def highlight_all_code_blocks(pages: list[dict]) -> list[dict]:
    """Highlight all code blocks across all pages."""
    
    # Collect all code blocks
    blocks = []
    block_locations = []  # Track which page/block each belongs to
    
    for page_idx, page in enumerate(pages):
        for block_idx, block in enumerate(page["code_blocks"]):
            blocks.append((block["code"], block["language"]))
            block_locations.append((page_idx, block_idx))
    
    # Highlight in parallel
    results = highlight_many(blocks)
    
    # Assign results back to pages
    for (page_idx, block_idx), html in zip(block_locations, results):
        pages[page_idx]["code_blocks"][block_idx]["html"] = html
    
    return pages
```

---

## Next Steps

- [[docs/about/thread-safety|Thread Safety]] — How Rosettes achieves thread safety
- [[docs/about/performance|Performance]] — Benchmarks and optimization tips

--------------------------------------------------------------------------------

Metadata:
- Word Count: 475
- Reading Time: 2 minutes