# Streaming HTML

URL: /chirp/docs/build-apps/streaming-updates/html-streaming/
Section: streaming-updates
Tags: streaming, html, progressive, chunked

--------------------------------------------------------------------------------

The Problem Traditional template rendering waits for all data before sending anything. If your dashboard fetches stats, recent activity, and notifications, the user stares at a blank page until the slowest query finishes. The Solution Stream renders template sections as they complete. The browser receives the page shell immediately and content fills in progressively: from chirp import Stream @app.route(&quot;/dashboard&quot;) async def dashboard(): return Stream(&quot;dashboard.html&quot;, header=get_header(), # Available immediately stats=await load_stats(), # Streams when ready activity=await load_activity(), # Streams when ready ) The HTTP response uses chunked transfer encoding. The browser renders progressively as chunks arrive -- no JavaScript loading states, no skeleton screens. How It Works 1Compile streaming rendererKida's compiler generates a streaming renderer alongside the standard renderer (same compilation pass, no performance impact). 2Send chunked responseWhen Stream is returned, Chirp sends the response with Transfer-Encoding: chunked. 3Yield HTML chunksKida's render_stream() yields HTML chunks as template sections complete. 4Stream to clientEach chunk is sent to the client immediately via ASGI body messages. 5Progressive renderThe browser renders each chunk as it arrives. Template: &lt;html&gt; ... &#123;% block header %&#125; ... &#123;% block stats %&#125; ... &#123;% block activity %&#125; Chunks: ────────→ ──────────────────→ ──────────────→ ─────────────────────────→ Time: 0ms 50ms 200ms 800ms Template Structure for Streaming Design templates with independent sections that can render in any order: {# dashboard.html #} &#123;% extends &quot;base.html&quot; %&#125; &#123;% block content %&#125; &lt;header&gt;{{ header }}&lt;/header&gt; &lt;section id=&quot;stats&quot;&gt; &#123;% block stats %&#125; &#123;% for stat in stats %&#125; &lt;div class=&quot;stat&quot;&gt;{{ stat.label }}: {{ stat.value }}&lt;/div&gt; &#123;% endfor %&#125; &#123;% endblock %&#125; &lt;/section&gt; &lt;section id=&quot;activity&quot;&gt; &#123;% block activity %&#125; &#123;% for event in activity %&#125; &lt;div class=&quot;event&quot;&gt;{{ event.description }}&lt;/div&gt; &#123;% endfor %&#125; &#123;% endblock %&#125; &lt;/section&gt; &#123;% endblock %&#125; Error Handling If an error occurs mid-stream, Chirp injects an HTML comment with the error details and closes the stream gracefully: &lt;!-- Stream error: DatabaseConnectionError: connection timed out --&gt; The already-sent content remains visible. This is better than a full-page error for partial failures. StreamingResponse Under the hood, Stream produces a StreamingResponse -- a peer to Response with the same chainable API: # StreamingResponse supports .with_*() methods return Stream(&quot;dashboard.html&quot;, data=data) # Internally becomes: # StreamingResponse(generator, status=200, headers=...) Middleware can add headers to streaming responses the same way as regular responses. Suspense: Instant First Paint with Deferred Blocks Suspense takes streaming further. Instead of waiting for all data before rendering anything, it sends the page shell immediately with skeleton content, then fills in blocks independently as their async data resolves: from chirp import Suspense @app.route(&quot;/dashboard&quot;) async def dashboard(): return Suspense(&quot;dashboard.html&quot;, header=site_header(), # sync -- in the shell stats=load_stats(), # awaitable -- shows skeleton first feed=load_feed(), # awaitable -- shows skeleton first ) Middleware-provided helpers such as get_user() and csrf_token() are ContextVar-backed. Capture those values in the handler before returning Stream, TemplateStream, Suspense, or EventStream; do not call them during streamed template rendering or inside SSE generators. The request object itself is restored for chunk iteration, so this warning is about middleware state such as auth/session/CSRF, not get_request(). @app.route(&quot;/dashboard&quot;) def dashboard(): user = get_user() token = csrf_token() return Suspense( &quot;dashboard.html&quot;, current_user=user, csrf_token_value=token, stats=load_stats(), ) Then the template reads current_user / csrf_token_value from plain context instead of calling the ContextVar-backed helpers during the stream. Use &#123;% if stats is not none %&#125; for loaded vs loading — not bare &#123;% if stats %&#125;, which stays falsy for empty tuple/list/&quot;&quot;/0 after resolution and can look like a perpetual skeleton. Optionally branch on &quot;stats&quot; in __chirp_defer_pending__ (a frozenset injected only by Suspense: pending key names in the shell, empty after resolution). The Python constant is CHIRP_DEFER_PENDING_KEY. The block must still reference the context key (e.g. stats) somewhere so block_metadata().depends_on can associate the block with that deferred key; membership in __chirp_defer_pending__ alone is not enough for discovery. &#123;% block stats %&#125; &#123;% if stats is not none %&#125; &#123;% for s in stats %&#125;&lt;div class=&quot;stat&quot;&gt;{{ s.label }}: {{ s.value }}&lt;/div&gt;&#123;% end %&#125; &#123;% else %&#125; &lt;div class=&quot;skeleton&quot;&gt;Loading stats...&lt;/div&gt; &#123;% end %&#125; &#123;% end %&#125; How it works: 1Render shell with skeletonsSync context values render in the shell; awaitable values are set to None, and __chirp_defer_pending__ lists their names until they resolve (use is not none or membership in that set for skeleton vs loaded — not truthiness alone). 2Send first chunkThe shell is sent immediately as the first chunk (instant first paint). 3Resolve awaitablesAwaitables resolve concurrently in the background. 4Find affected blocksBlocks to re-render are discovered via block_metadata().depends_on — Kida's static analysis traces which blocks reference the deferred keys. Ancestor blocks whose dependency set is a strict superset of a leaf block are pruned (they'd produce wasteful OOB chunks targeting non-existent DOM ids). When static analysis misses a block (e.g. deferred values passed through macro arguments), set defer_blocks to list them explicitly: return Suspense(&quot;page.html&quot;, defer_blocks=(&quot;hero_stats&quot;, &quot;sidebar_stats&quot;), stats=load_stats(), ) Use defer_map to remap block names to different DOM ids for the OOB swap target: return Suspense(&quot;page.html&quot;, defer_map={&quot;stats&quot;: &quot;stats-panel&quot;}, stats=load_stats(), ) 5Stream OOB swapsEach affected block is re-rendered with real data and sent as an out-of-band swap. 6Client receives updatesFor htmx navigations: OOB swaps via hx-swap-oob. For initial page loads: &lt;template&gt; + inline &lt;script&gt; pairs. No client-side framework needed. The browser renders the shell, and blocks fill in as data arrives. Reuse deferred values with DeferredCache Use DeferredCache when the same deferred value is needed by multiple blocks or nearby page navigations and the value can be reused for a short TTL window. The cache is explicit app or route state: there is no process-wide default. from chirp import DeferredCache, Suspense stars_cache = DeferredCache(default_ttl=300) @app.route(&quot;/&quot;) def home(): return Suspense( &quot;home.html&quot;, stars=stars_cache.get_or_defer( &quot;gh:lbliii/chirp&quot;, lambda: fetch_github_stars_label(&quot;lbliii&quot;, &quot;chirp&quot;), ), ) On a cache miss, get_or_defer() returns an awaitable, so Suspense renders the skeleton and streams the resolved block later. On a warm hit, it returns the cached value directly, so the value renders in the initial shell and no OOB chunk is needed. Only successful results are cached; exceptions continue through Suspense's existing error fallback path. The factory must return an awaitable, not a pre-created coroutine, so warm cache hits do not allocate unused coroutine objects. DeferredCache does not create a browser-side store and does not push real-time updates. When using mount_pages, Suspense receives the layout chain automatically. The first chunk is wrapped in your _layout.html shell (head, CSS, sidebar), and OOB swaps target block IDs inside the page. Fragment-only requests skip the layout (same as Page). Alpine.js: Streaming responses are still HTML documents. When AppConfig(alpine=True), AlpineInject rewrites the chunk stream so the Alpine bundle is inserted before &lt;/body&gt; in the final output—same deduplication rules as buffered pages—so shell-first routes (Suspense, skeletons) keep interactive components working without inlining scripts in layouts. If use_chirp_ui(app) is active, the shared chirpui-alpine.js runtime is also injected on full-page streaming HTML, so named chirp-ui controllers remain available there too. When to Use Each Use Suspense when: A page has independent data sources with different load times You want instant first paint with skeleton/loading states Some sections load fast (navigation, layout) while others are slow (analytics, feeds) Use Stream when: A page has multiple independent data sources with varying load times You want top-to-bottom progressive rendering Time-to-first-byte matters more than total render time Use Template when: All data is available quickly The template is simple You need the complete response for caching or processing Next Steps Server-Sent Events -- Real-time push updates Return Values -- All return types Rendering -- Standard template rendering

--------------------------------------------------------------------------------

Metadata:
- Word Count: 1225
- Reading Time: 6 minutes