Thread Safety

Kida is designed for concurrent rendering in free-threaded Python.

Tested Support Status

Kida's free-threading claim is an enforced test contract, not an import-compatibility label. The required CI workflow runs on free-threaded Python 3.14t withPYTHON_GIL=0for every pull request:

the full pytest and coverage suite;
focused thread-safety and async suites; and
the core benchmark regression gate.

Weekly and manual runs expand the randomized shared-runtime test from one seed to 25 consecutive seeds: 10,000 barrier-synchronized operations across render, streaming, introspection, cache, terminal, and worker-selection paths. The same window is repeated with Python development mode, allocator debug hooks, and faulthandler. The test records its seed so failures are reproducible.

This evidence supports only the sharing and mutation matrix on this page. Applications still own synchronization for mutable values passed into renders, custom filters/globals/loaders, concurrent source changes, process-environment mutation, and APIs documented with a single lifecycle owner. The debug-runtime protocol is not ThreadSanitizer or a CPythonPy_DEBUGbuild, and free-threading does not turn the sandbox or Python process into a security boundary.

Free-Threading Support

Kida declares GIL-independence via PEP 703:

# In kida/__init__.py
def __getattr__(name):
    if name == "_Py_mod_gil":
        return 0  # Py_MOD_GIL_NOT_USED

This signals that Kida is safe for true parallel execution in Python 3.14t+.

Thread-Safe Design

Startup-Only Configuration

Create and configure an environment before sharing it with render workers:

env = Environment(
    loader=FileSystemLoader("templates/"),
    autoescape=True,
)
# Treat configuration attributes as startup-only after this point

Environmentis not a frozen Python object. Direct assignment to public configuration attributes while other threads are loading or rendering is not supported.

Copy-on-Write Registry Updates

The filter, test, and global registration APIs publish new dictionaries:

def add_filter(self, name, func):
    # Copy-on-write: no locking needed
    new_filters = self._filters.copy()
    new_filters[name] = func
    self._filters = new_filters

A concurrent reader sees either the complete previous dictionary or the complete replacement. This protects readers; it does not make registration a multi-writer transaction. If two threads register against the same prior snapshot, the later publication can overwrite the other thread's addition. Registration across filters, tests, and globals is not atomic either.

Configure registries before serving traffic. If runtime registration is unavoidable, serialize writers in the application. Useadd_global()for copy-on-write global publication; directenv.globals[...]mutation is not safe alongside concurrent readers. Custom callables and global values must protect any mutable state they own.

RenderContext Isolation

Eachrender() call creates an isolated RenderContextvia ContextVar:

from kida.render_context import render_context

def render(self, **context):
    with render_context(template_name=self._name) as ctx:
        _out = []  # Local buffer
        # ctx.line updated during render for error tracking
        # No internal keys pollute user context
        return "".join(_out)

Benefits:

Thread isolation: ContextVars are thread-local by design
Async safety: Propagates correctly toasyncio.to_thread()in Python 3.14
Clean user context: No internal keys (_template, _line) injected

No shared mutable state between render calls.

Thread-Safe Caching

LRU caches use an internalRLockfor safe concurrent access:

# Thread-safe cache access (RLock-protected internally)
cached = self._cache.get(name)
self._cache.set(name, template)

Concurrency Model

flowchart TB subgraph Thread1 [Thread 1] T1Env[Environment] T1Template[Template] T1Ctx1[RenderContext via ContextVar] T1Buf1[Local buf list] end subgraph Thread2 [Thread 2] T2Env[Environment] T2Template[Template] T2Ctx2[RenderContext via ContextVar] T2Buf2[Local buf list] end Cache[(LRU Cache RLock)] T1Env --> Cache T2Env --> Cache T1Env --> T1Template T2Env --> T2Template T1Template --> T1Ctx1 T2Template --> T2Ctx2 T1Ctx1 --> T1Buf1 T2Ctx2 --> T2Buf2

Template: Immutable after construction; safe to share across threads.
RenderContext: Isolated per render via ContextVar; no cross-thread leakage.
Cache: Protected by internal RLock; concurrent get/set is safe.

Shared Static Analysis

BlockAnalyzer, DependencyWalker, and PurityAnalyzerkeep mutable traversal state inContextVar, and LandmarkDetectoruses only call-local state. One instance of each may therefore be shared across concurrent analysis calls.

Analysis metadata dataclasses are frozen, but some records expose mapping-valued fields such asTemplateMetadata.blocks. Treat those mappings as read-only while results are shared.

Shared Loaders

Built-in loaders support concurrent reads when their configured sources remain stable. Do not mutate mappings passed toDictLoader or PrefixLoader, or the loader list passed toChoiceLoader, while workers are reading. Composite loaders inherit their child loaders' guarantees, andFunctionLoaderinherits the guarantee of its callable. Concurrent filesystem or installed-package updates are outside this contract.

When to Use Locks

If you add custom filters or globals that touch shared mutable state, you must protect that state:

import threading

_shared_counter_lock = threading.Lock()
_shared_counter = 0

def counting_filter(value):
    global _shared_counter
    with _shared_counter_lock:
        _shared_counter += 1
    return str(value)

env.add_filter("counted", counting_filter)

Guidance:

Prefer stateless filters: same inputs always produce same output.
If state is needed, usethreading.Lock or contextvarsfor isolation.
Avoid module-level mutable dicts/lists that filters modify without protection.

Macro and Import Isolation

When using{% extends %} or {% from X import y %}, each child template gets an isolated copy of import_stack and template_stack. No shared mutable state flows across the extends/import chain. This ensures:

Parallel renders of different pages do not interfere.
Nested macro calls have correct attribution in error traces.
Circular import detection works per-render without cross-thread races.

Free-Threading Design Principles

Kida's concurrency model follows these principles. When extending or modifying Kida, preserve them:

Copy on fork: When creating child contexts (includes, extends, imports), copy mutable state (e.g.import_stack) instead of sharing. Each level of the extends/import chain must have isolated state. Sharing mutable state across parallel renders can cause cross-thread interference.
No shared mutable state in hot paths: Caches use locks or per-call isolation. Render state lives in ContextVar, not globals. Avoid unprotected shared dicts in render paths.
ContextVar for per-call state: All render-scoped state (template name, line, blocks, import stack) lives inRenderContextvia ContextVar. This ensures each render call has isolated state regardless of thread or async context.

Concurrent Rendering

With ThreadPoolExecutor

from concurrent.futures import ThreadPoolExecutor
from kida import Environment, FileSystemLoader

env = Environment(loader=FileSystemLoader("templates/"))
template = env.get_template("page.html")

def render_page(context):
    return template.render(**context)

contexts = [{"name": f"User {i}"} for i in range(100)]

# On Python 3.14t, this runs with true parallelism
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(render_page, contexts))

With asyncio

import asyncio

async def render_many(env):
    template = env.get_template("page.html")

    # Use asyncio.to_thread() for true parallel rendering on 3.14t
    tasks = [
        asyncio.to_thread(template.render, user=f"User {i}")
        for i in range(100)
    ]
    return await asyncio.gather(*tasks)

What's Safe

Operation	Thread-Safe
`get_template()`	✅ Yes
`from_string()`	✅ Yes
`template.render()`	✅ Yes
`template.render_stream()`	✅ Yes
`add_filter()`	Startup/configuration only
`add_test()`	Startup/configuration only
`add_global()`	Startup/configuration only
`clear_cache()`	✅ Yes
Built-in loader`get_source()`	✅ Yes, for stable configured sources
Shared static analyzer`analyze()`	✅ Yes

Concurrentrender() and render_stream()on the same template from different threads is safe. Registry readers may observe the complete state before or after a copy-on-write registration. Concurrent writers have no ordering or merge guarantee and must be externally serialized.

Component Concurrency Matrix

Component	Concurrent Reads	Concurrent Writes	Notes
`Environment.get_template`	Yes	Yes (LRU locked)	Cache dicts protected by`_cache_lock`
`Template.render`	Yes	N/A	Per-call state via ContextVar
Built-in loaders	Yes	No	Stable sources only; composite/user-function guarantees are inherited
`BlockAnalyzer`/ traversal analyzers	Yes	N/A	Per-call or ContextVar traversal state; shared metadata mappings are read-only
Filter/test/global registries	Snapshot reads	Startup only	Copy-on-write APIs publish complete mappings; competing writers may lose updates
`CoverageCollector`	Context-local	Distinct collectors	Global instrumentation lifecycle is locked; start/stop one instance in the same context
`LiveRenderer`	Yes	Serialized updates	Context merge, render, and output share one lock; lifecycle has one owner
`Spinner`	Yes	Yes	Frame advancement and reset use an internal lock
Worker selection	Yes	N/A	Read-only profiles and thread-safe cached GIL detection
`CachedBlocksDict`	Yes	Stats safe	Stats updates use lock when shared
`Compiler.compile`	No	No	One compile at a time per Compiler instance

Best Practices

Create Environment Once

# ✅ Create once, reuse everywhere
env = Environment(loader=FileSystemLoader("templates/"))

def handle_request(request):
    template = env.get_template(request.path)
    return template.render(**request.context)

Macro Import Patterns

Use{% from "partials/x.html" import macro_name %} — Ensure the imported template defines the requested macro. If the macro is missing, Kida raises TemplateRuntimeError with ErrorCode.MACRO_NOT_FOUNDat import time.
Extends + import — When using{% extends %} and {% from %}, each child gets an isolated import_stack; no shared mutable state. See "Copy on fork" in Free-Threading Design Principles.
Import macros only —{% from "x" import y %} expects yto be a macro (callable). Do not import filters or other globals this way.

Don't Mutate During Rendering

# ❌ Don't add filters during concurrent rendering
def render_with_filter(value):
    env.add_filter("custom", custom_func)  # Race condition!
    return template.render(value=value)

# ✅ Add filters at startup
env.add_filter("custom", custom_func)

def render(value):
    return template.render(value=value)

Use Template Caching

# Templates are compiled once, then cached
# Concurrent get_template() calls for the same name
# wait for the first compilation to complete
template = env.get_template("page.html")

Performance with Free-Threading

Numbers frombenchmarks/test_benchmark_full_comparison.py(Python 3.14.2 free-threading, Apple Silicon).

Kida Scaling (vs single-threaded baseline)

Workers	Time	Speedup
1	1.80ms	1.0x
2	1.12ms	1.61x
4	1.62ms	1.11x
8	1.76ms	1.02x

Kida vs Jinja2 (Concurrent)

Workers	Kida	Jinja2	Kida Advantage
1	1.80ms	1.80ms	~same
2	1.12ms	1.15ms	~same
4	1.62ms	1.90ms	1.17x
8	1.76ms	1.97ms	1.12x

Key insight: Jinja2 shows negative scaling at 4+ workers (slower than 1 worker), indicating internal contention. Kida's thread-safe design avoids this.

Code References

Pattern	File
PEP 703 declaration	src/kida/init.py
RenderContext (ContextVar)	src/kida/template/core.py
Copy-on-write filters	src/kida/environment/core.py
Free-threading detection	src/kida/utils/workers.py