Kida provides a workload-aware worker pool toolkit for framework authors who need to parallelize template rendering. It is calibrated for free-threaded Python (3.14t) where CPU-bound rendering achieves true parallelism.
from kida.utils.workers import get_optimal_workers, should_parallelize
Quick Start
from concurrent.futures import ThreadPoolExecutor
from kida.utils.workers import get_optimal_workers, should_parallelize
contexts = [{"name": f"User {i}"} for i in range(100)]
if should_parallelize(len(contexts)):
workers = get_optimal_workers(len(contexts))
with ThreadPoolExecutor(max_workers=workers) as executor:
results = list(executor.map(template.render, contexts))
else:
results = [template.render(**ctx) for ctx in contexts]
Core Functions
should_parallelize
Determine if parallelization is worthwhile. Thread pool overhead (~1-2ms per task) only pays off above a threshold.
from kida.utils.workers import should_parallelize
should_parallelize(5) # False — below threshold
should_parallelize(100) # True — above threshold
# With work size estimate (bytes of template output)
should_parallelize(100, total_work_estimate=500) # False — too small
get_optimal_workers
Calculate the optimal worker count based on workload type, environment, CPU cores, and free-threading status.
from kida.utils.workers import get_optimal_workers, WorkloadType
# Template rendering (default)
get_optimal_workers(100) # 4 (local, free-threading)
# Template compilation
get_optimal_workers(100, workload_type=WorkloadType.COMPILE) # 2
# Override auto-tuning
get_optimal_workers(100, config_override=16) # 16
# Weight heavy templates higher
get_optimal_workers(50, task_weight=2.0) # Adjusts for heavy work
Workload Types
| Type | Use Case | Parallelism |
|---|---|---|
WorkloadType.RENDER |
Template rendering (CPU-bound) | High — benefits from free-threading |
WorkloadType.COMPILE |
Template compilation (CPU-bound) | Moderate — shared cache limits scaling |
WorkloadType.IO_BOUND |
File loading, network | High — threads wait on I/O |
Environment Detection
The toolkit auto-detects the execution environment to tune worker counts:
| Environment | Detection | Worker Strategy |
|---|---|---|
| CI | CI, GITHUB_ACTIONS, etc. |
Conservative (2 workers max) |
| Local | Default | Moderate (up to 4 workers) |
| Production | KIDA_ENV=production |
Aggressive (up to 8 workers) |
Override detection with theKIDA_ENVenvironment variable:
export KIDA_ENV=production # or "ci" or "local"
Free-Threading Detection
The toolkit detects whether the GIL is disabled and scales worker counts accordingly:
from kida.utils.workers import is_free_threading_enabled
if is_free_threading_enabled():
print("GIL disabled — true parallelism available")
On free-threaded Python, render workloads get a 1.5x multiplier on the CPU-based worker count.
Template Scheduling
For optimal throughput, schedule heavy templates first to avoid the "straggler effect" where one slow render delays overall completion.
estimate_template_weight
Estimate relative complexity of a template:
from kida.utils.workers import estimate_template_weight
weight = estimate_template_weight(template)
# 1.0 = average, >1 = heavy, <1 = light (capped at 5.0)
Weight factors:
- Source size: +0.5 per 5KB above 5KB threshold
- Block count: +0.1 per block above 3
- Macro count: +0.2 per macro
- Inheritance: +0.5 if extends another template
- Includes: +0.15 per include statement
order_by_complexity
Sort templates for optimal parallel execution:
from kida.utils.workers import order_by_complexity
# Heavy templates first (default — best for parallel execution)
ordered = order_by_complexity(templates)
# Light templates first (useful for testing)
ordered = order_by_complexity(templates, descending=False)
Workload Profiles
Inspect the tuning parameters for any workload/environment combination:
from kida.utils.workers import get_profile, WorkloadType
profile = get_profile(WorkloadType.RENDER)
print(profile.parallel_threshold) # 10
print(profile.max_workers) # 4
print(profile.free_threading_multiplier) # 1.5
WorkloadProfile Fields
| Field | Type | Description |
|---|---|---|
parallel_threshold |
int |
Minimum tasks before parallelizing |
min_workers |
int |
Floor for worker count |
max_workers |
int |
Ceiling for worker count |
cpu_fraction |
float |
Fraction of cores to use (0.0-1.0) |
free_threading_multiplier |
float |
Extra scaling when GIL is disabled |
Complete Example
from concurrent.futures import ThreadPoolExecutor
from kida import Environment, FileSystemLoader
from kida.utils.workers import (
get_optimal_workers,
order_by_complexity,
should_parallelize,
WorkloadType,
)
env = Environment(loader=FileSystemLoader("templates/"))
# Load and schedule templates
templates = [env.get_template(name) for name in env.loader.list_templates()]
ordered = order_by_complexity(templates)
# Build render tasks
tasks = [(tmpl, {"page": page}) for tmpl, page in zip(ordered, pages, strict=True)]
if should_parallelize(len(tasks)):
workers = get_optimal_workers(
len(tasks),
workload_type=WorkloadType.RENDER,
)
with ThreadPoolExecutor(max_workers=workers) as pool:
results = list(pool.map(lambda t: t[0].render(**t[1]), tasks))
else:
results = [tmpl.render(**ctx) for tmpl, ctx in tasks]
See Also
- Thread Safety — Free-threading design
- Performance — Concurrent benchmark results
- Static Analysis — Template complexity analysis