# Worker Auto-Tuning

URL: /docs/advanced/workers/
Section: advanced
Tags: advanced, performance, threading

--------------------------------------------------------------------------------

Worker Auto-Tuning Kida provides a workload-aware worker pool toolkit for framework authors who need to parallelize template rendering. It is calibrated for free-threaded Python (3.14t) where CPU-bound rendering achieves true parallelism. from kida.utils.workers import get_optimal_workers, should_parallelize Quick Start from concurrent.futures import ThreadPoolExecutor from kida.utils.workers import get_optimal_workers, should_parallelize contexts = [{&quot;name&quot;: f&quot;User {i}&quot;} for i in range(100)] if should_parallelize(len(contexts)): workers = get_optimal_workers(len(contexts)) with ThreadPoolExecutor(max_workers=workers) as executor: results = list(executor.map(template.render, contexts)) else: results = [template.render(**ctx) for ctx in contexts] Core Functions should_parallelize Determine if parallelization is worthwhile. Thread pool overhead (~1-2ms per task) only pays off above a threshold. from kida.utils.workers import should_parallelize should_parallelize(5) # False — below threshold should_parallelize(100) # True — above threshold # With work size estimate (bytes of template output) should_parallelize(100, total_work_estimate=500) # False — too small get_optimal_workers Calculate the optimal worker count based on workload type, environment, CPU cores, and free-threading status. from kida.utils.workers import get_optimal_workers, WorkloadType # Template rendering (default) get_optimal_workers(100) # 4 (local, free-threading) # Template compilation get_optimal_workers(100, workload_type=WorkloadType.COMPILE) # 2 # Override auto-tuning get_optimal_workers(100, config_override=16) # 16 # Weight heavy templates higher get_optimal_workers(50, task_weight=2.0) # Adjusts for heavy work Workload Types Type Use Case Parallelism WorkloadType.RENDER Template rendering (CPU-bound) High — benefits from free-threading WorkloadType.COMPILE Template compilation (CPU-bound) Moderate — shared cache limits scaling WorkloadType.IO_BOUND File loading, network High — threads wait on I/O Environment Detection The toolkit auto-detects the execution environment to tune worker counts: Environment Detection Worker Strategy CI CI, GITHUB_ACTIONS, etc. Conservative (2 workers max) Local Default Moderate (up to 4 workers) Production KIDA_ENV=production Aggressive (up to 8 workers) Override detection with the KIDA_ENV environment variable: export KIDA_ENV=production # or &quot;ci&quot; or &quot;local&quot; Free-Threading Detection The toolkit detects whether the GIL is disabled and scales worker counts accordingly: from kida.utils.workers import is_free_threading_enabled if is_free_threading_enabled(): print(&quot;GIL disabled — true parallelism available&quot;) On free-threaded Python, render workloads get a 1.5x multiplier on the CPU-based worker count. Template Scheduling For optimal throughput, schedule heavy templates first to avoid the &quot;straggler effect&quot; where one slow render delays overall completion. estimate_template_weight Estimate relative complexity of a template: from kida.utils.workers import estimate_template_weight weight = estimate_template_weight(template) # 1.0 = average, &gt;1 = heavy, &lt;1 = light (capped at 5.0) Weight factors: Source size: +0.5 per 5KB above 5KB threshold Block count: +0.1 per block above 3 Macro count: +0.2 per macro Inheritance: +0.5 if extends another template Includes: +0.15 per include statement order_by_complexity Sort templates for optimal parallel execution: from kida.utils.workers import order_by_complexity # Heavy templates first (default — best for parallel execution) ordered = order_by_complexity(templates) # Light templates first (useful for testing) ordered = order_by_complexity(templates, descending=False) Workload Profiles Inspect the tuning parameters for any workload/environment combination: from kida.utils.workers import get_profile, WorkloadType profile = get_profile(WorkloadType.RENDER) print(profile.parallel_threshold) # 10 print(profile.max_workers) # 4 print(profile.free_threading_multiplier) # 1.5 WorkloadProfile Fields Field Type Description parallel_threshold int Minimum tasks before parallelizing min_workers int Floor for worker count max_workers int Ceiling for worker count cpu_fraction float Fraction of cores to use (0.0-1.0) free_threading_multiplier float Extra scaling when GIL is disabled Complete Example from concurrent.futures import ThreadPoolExecutor from kida import Environment, FileSystemLoader from kida.utils.workers import ( get_optimal_workers, order_by_complexity, should_parallelize, WorkloadType, ) env = Environment(loader=FileSystemLoader(&quot;templates/&quot;)) # Load and schedule templates templates = [env.get_template(name) for name in env.loader.list_templates()] ordered = order_by_complexity(templates) # Build render tasks tasks = [(tmpl, {&quot;page&quot;: page}) for tmpl, page in zip(ordered, pages, strict=True)] if should_parallelize(len(tasks)): workers = get_optimal_workers( len(tasks), workload_type=WorkloadType.RENDER, ) with ThreadPoolExecutor(max_workers=workers) as pool: results = list(pool.map(lambda t: t[0].render(**t[1]), tasks)) else: results = [tmpl.render(**ctx) for tmpl, ctx in tasks] See Also Thread Safety — Free-threading design Performance — Concurrent benchmark results Static Analysis — Template complexity analysis

--------------------------------------------------------------------------------

Metadata:
- Word Count: 594
- Reading Time: 3 minutes