Module

utils.workers

Worker pool auto-tuning utilities for free-threaded Python.

Provides workload-aware worker count calculation for ThreadPoolExecutor usage. Calibrated for Python 3.14t (free-threading) where CPU-bound template rendering can achieve true parallelism without GIL contention.

Key Features:

  • Environment detection (CI vs local vs production)
  • Free-threading detection (GIL status)
  • Workload type profiles calibrated for no-GIL execution
  • Template complexity estimation for optimal scheduling

Example:

>>> from kida.utils.workers import get_optimal_workers, should_parallelize
>>> contexts = [{"name": f"User {i}"} for i in range(100)]
>>> if should_parallelize(len(contexts)):
...     workers = get_optimal_workers(len(contexts))
...     with ThreadPoolExecutor(max_workers=workers) as executor:
...         results = list(executor.map(template.render, contexts))

Note:

Profiles are calibrated for free-threaded Python (3.14t+). On GIL-enabled Python, CPU-bound parallelism is limited.

Classes

WorkloadType 3
Workload characteristics for auto-tuning. On free-threaded Python, CPU-bound work can now parallel…

Workload characteristics for auto-tuning.

On free-threaded Python, CPU-bound work can now parallelize effectively. This changes optimal worker counts compared to GIL-enabled Python.

Attributes

Name Type Description
RENDER

Template rendering (CPU-bound, string operations). Primary workload for Kida. Benefits significantly from free-threading.

COMPILE

Template compilation/parsing (CPU-bound, AST operations). Moderate parallelism benefit due to shared cache access.

IO_BOUND

File loading, network operations. Can use more workers as threads wait on I/O.

Environment 3
Execution environment for tuning profiles.

Execution environment for tuning profiles.

Attributes

Name Type Description
CI

Constrained CI runner (typically 2-4 vCPU). Use minimal workers to avoid resource contention.

LOCAL

Developer machine (typically 8-16 cores). Use moderate workers for good performance.

PRODUCTION

Server deployment (16+ cores). Can use more workers for high throughput.

WorkloadProfile 5
Tuning profile for a workload type.

Tuning profile for a workload type.

Attributes

Name Type Description
parallel_threshold int

Minimum tasks before parallelizing. Below this, thread overhead exceeds benefit.

min_workers int

Floor for worker count.

max_workers int

Ceiling for worker count.

cpu_fraction float

Fraction of cores to use (0.0-1.0).

free_threading_multiplier float

Extra scaling when GIL is disabled.

Functions

is_free_threading_enabled 0 bool
Check if Python is running with the GIL disabled.
def is_free_threading_enabled() -> bool
Returns
bool
detect_environment 0 Environment
Auto-detect execution environment for tuning. **Detection order:** 1. Explicit…
def detect_environment() -> Environment

Auto-detect execution environment for tuning.

Detection order:

  1. Explicit KIDA_ENV environment variable
    1. CI environment variables (GitHub Actions, GitLab CI, etc.)
    2. Default to LOCAL
Returns
Environment
get_optimal_workers 1 int
Calculate optimal worker count based on workload characteristics. Auto-scales …
def get_optimal_workers(task_count: int) -> int

Calculate optimal worker count based on workload characteristics.

Auto-scales based on:

  • Workload type (render vs compile vs I/O)
  • Environment (CI vs local vs production)
  • Free-threading status (GIL enabled/disabled)
  • Available CPU cores (fraction based on workload)
  • Task count (no point having more workers than tasks)
  • Optional task weight for heavy/light work estimation
Parameters
Name Type Description
task_count int

Number of tasks to process (e.g., contexts to render)

Returns
int
should_parallelize 1 bool
Determine if parallelization is worthwhile for this workload. Thread pool over…
def should_parallelize(task_count: int) -> bool

Determine if parallelization is worthwhile for this workload.

Thread pool overhead (~1-2ms per task) only pays off above threshold. This function helps avoid the overhead for small workloads.

Parameters
Name Type Description
task_count int

Number of tasks to process

Returns
bool
estimate_template_weight 1 float
Estimate relative complexity of a template for worker scheduling. Heavy templa…
def estimate_template_weight(template: Template) -> float

Estimate relative complexity of a template for worker scheduling.

Heavy templates (many blocks, macros, filters) get higher weights, causing them to be scheduled earlier to avoid straggler effect.

Weight factors:

  • Source size: +0.5 per 5KB above 5KB threshold
  • Block count: +0.1 per block above 3
  • Macro count: +0.2 per macro
  • Inheritance: +0.5 if extends another template
Parameters
Name Type Description
template Template

Template instance to estimate

Returns
float
order_by_complexity 1 list[Template]
Order templates by estimated complexity for optimal worker utilization. Schedu…
def order_by_complexity(templates: list[Template]) -> list[Template]

Order templates by estimated complexity for optimal worker utilization.

Scheduling heavy templates first reduces the "straggler effect" where one slow render delays overall completion.

Parameters
Name Type Description
templates list[Template]

List of templates to order

Returns
list[Template]
get_profile 2 WorkloadProfile
Get the workload profile for inspection or testing.
def get_profile(workload_type: WorkloadType, environment: Environment | None = None) -> WorkloadProfile
Parameters
Name Type Description
workload_type WorkloadType

Type of work

environment Environment | None

Execution environment (auto-detected if None)

Default:None
Returns
WorkloadProfile