The Problem
Traditional template rendering waits for all data before sending anything. If your dashboard fetches stats, recent activity, and notifications, the user stares at a blank page until the slowest query finishes.
The Solution
Streamrenders template sections as they complete. The browser receives the page shell immediately and content fills in progressively:
from chirp import Stream
@app.route("/dashboard")
async def dashboard():
return Stream("dashboard.html",
header=get_header(), # Available immediately
stats=await load_stats(), # Streams when ready
activity=await load_activity(), # Streams when ready
)
The HTTP response uses chunked transfer encoding. The browser renders progressively as chunks arrive -- no JavaScript loading states, no skeleton screens.
How It Works
- 1
Compile streaming renderer
Kida's compiler generates a streaming renderer alongside the standard renderer (same compilation pass, no performance impact).
- 2
Send chunked response
When
Streamis returned, Chirp sends the response withTransfer-Encoding: chunked. - 3
Yield HTML chunks
Kida's
render_stream()yields HTML chunks as template sections complete. - 4
Stream to client
Each chunk is sent to the client immediately via ASGI body messages.
- 5
Progressive render
The browser renders each chunk as it arrives.
Template: <html> ... {% block header %} ... {% block stats %} ... {% block activity %}
Chunks: ────────→ ──────────────────→ ──────────────→ ─────────────────────────→
Time: 0ms 50ms 200ms 800ms
Template Structure for Streaming
Design templates with independent sections that can render in any order:
{# dashboard.html #}
{% extends "base.html" %}
{% block content %}
<header>{{ header }}</header>
<section id="stats">
{% block stats %}
{% for stat in stats %}
<div class="stat">{{ stat.label }}: {{ stat.value }}</div>
{% endfor %}
{% endblock %}
</section>
<section id="activity">
{% block activity %}
{% for event in activity %}
<div class="event">{{ event.description }}</div>
{% endfor %}
{% endblock %}
</section>
{% endblock %}
Error Handling
If an error occurs mid-stream, Chirp injects an HTML comment with the error details and closes the stream gracefully:
<!-- Stream error: DatabaseConnectionError: connection timed out -->
The already-sent content remains visible. This is better than a full-page error for partial failures.
StreamingResponse
Under the hood,Stream produces a StreamingResponse -- a peer to Responsewith the same chainable API:
# StreamingResponse supports .with_*() methods
return Stream("dashboard.html", data=data)
# Internally becomes:
# StreamingResponse(generator, status=200, headers=...)
Middleware can add headers to streaming responses the same way as regular responses.
Suspense: Instant First Paint with Deferred Blocks
Suspensetakes streaming further. Instead of waiting for all data before rendering anything, it sends the page shell immediately with skeleton content, then fills in blocks independently as their async data resolves:
from chirp import Suspense
@app.route("/dashboard")
async def dashboard():
return Suspense("dashboard.html",
header=site_header(), # sync -- in the shell
stats=load_stats(), # awaitable -- shows skeleton first
feed=load_feed(), # awaitable -- shows skeleton first
)
Middleware-provided helpers such as get_user() and csrf_token()are
ContextVar-backed. Capture those values in the handler before returning
Stream, TemplateStream, Suspense, or EventStream; do not call them
during streamed template rendering or inside SSE generators. The request object
itself is restored for chunk iteration, so this warning is about middleware
state such as auth/session/CSRF, notget_request().
@app.route("/dashboard")
def dashboard():
user = get_user()
token = csrf_token()
return Suspense(
"dashboard.html",
current_user=user,
csrf_token_value=token,
stats=load_stats(),
)
Then the template reads current_user / csrf_token_valuefrom plain context
instead of calling the ContextVar-backed helpers during the stream.
Use {% if stats is not none %} for loaded vs loading — not bare {% if stats %}, which stays falsy for empty tuple/list/""/0 after resolution and can look like a perpetual skeleton. Optionally branch on "stats" in __chirp_defer_pending__ (a frozenset injected only by Suspense: pending key names in the shell, empty after resolution). The Python constant is CHIRP_DEFER_PENDING_KEY. The block must still reference the context key (e.g. stats) somewhere so block_metadata().depends_on can associate the block with that deferred key; membership in __chirp_defer_pending__alone is not enough for discovery.
{% block stats %}
{% if stats is not none %}
{% for s in stats %}<div class="stat">{{ s.label }}: {{ s.value }}</div>{% end %}
{% else %}
<div class="skeleton">Loading stats...</div>
{% end %}
{% end %}
How it works:
- 1
Render shell with skeletons
Sync context values render in the shell; awaitable values are set to
None, and__chirp_defer_pending__lists their names until they resolve (useis not noneor membership in that set for skeleton vs loaded — not truthiness alone). - 2
Send first chunk
The shell is sent immediately as the first chunk (instant first paint).
- 3
Resolve awaitables
Awaitables resolve concurrently in the background.
- 4
Find affected blocks
Blocks to re-render are discovered via
block_metadata().depends_on— Kida's static analysis traces which blocks reference the deferred keys. Ancestor blocks whose dependency set is a strict superset of a leaf block are pruned (they'd produce wasteful OOB chunks targeting non-existent DOM ids).When static analysis misses a block (e.g. deferred values passed through macro arguments), set
defer_blocksto list them explicitly:return Suspense("page.html", defer_blocks=("hero_stats", "sidebar_stats"), stats=load_stats(), )Use
defer_mapto remap block names to different DOM ids for the OOB swap target:return Suspense("page.html", defer_map={"stats": "stats-panel"}, stats=load_stats(), ) - 5
Stream OOB swaps
Each affected block is re-rendered with real data and sent as an out-of-band swap.
- 6
Client receives updates
For htmx navigations: OOB swaps via
hx-swap-oob. For initial page loads:<template>+ inline<script>pairs.
No client-side framework needed. The browser renders the shell, and blocks fill in as data arrives.
Reuse deferred values withDeferredCache
UseDeferredCachewhen the same deferred value is needed by multiple blocks
or nearby page navigations and the value can be reused for a short TTL window.
The cache is explicit app or route state: there is no process-wide default.
from chirp import DeferredCache, Suspense
stars_cache = DeferredCache(default_ttl=300)
@app.route("/")
def home():
return Suspense(
"home.html",
stars=stars_cache.get_or_defer(
"gh:lbliii/chirp",
lambda: fetch_github_stars_label("lbliii", "chirp"),
),
)
On a cache miss, get_or_defer() returns an awaitable, so Suspenserenders
the skeleton and streams the resolved block later. On a warm hit, it returns the
cached value directly, so the value renders in the initial shell and no OOB
chunk is needed. Only successful results are cached; exceptions continue
through Suspense's existing error fallback path. The factory must return an
awaitable, not a pre-created coroutine, so warm cache hits do not allocate
unused coroutine objects.DeferredCachedoes not create a browser-side store
and does not push real-time updates.
When usingmount_pages, Suspense receives the layout chain automatically. The first chunk is wrapped in your _layout.html shell (head, CSS, sidebar), and OOB swaps target block IDs inside the page. Fragment-only requests skip the layout (same as Page).
Alpine.js: Streaming responses are still HTML documents. WhenAppConfig(alpine=True), AlpineInject rewrites the chunk stream so the Alpine bundle is inserted before </body> in the final output—same deduplication rules as buffered pages—so shell-first routes (Suspense, skeletons) keep interactive components working without inlining scripts in layouts. If use_chirp_ui(app) is active, the shared chirpui-alpine.jsruntime is also injected on full-page streaming HTML, so named chirp-ui controllers remain available there too.
When to Use Each
UseSuspensewhen:
- A page has independent data sources with different load times
- You want instant first paint with skeleton/loading states
- Some sections load fast (navigation, layout) while others are slow (analytics, feeds)
UseStreamwhen:
- A page has multiple independent data sources with varying load times
- You want top-to-bottom progressive rendering
- Time-to-first-byte matters more than total render time
UseTemplatewhen:
- All data is available quickly
- The template is simple
- You need the complete response for caching or processing
Next Steps
- Server-Sent Events -- Real-time push updates
- Return Values -- All return types
- Rendering -- Standard template rendering