RAG Demo

Overview

The RAG demo is a documentation Q&A app: you type a question, it retrieves the relevant docs from SQLite, and it streams an AI answer with cited sources back over the wire — no React, no npm, around 50 lines of Python. Reach for it to see Server-Sent Events, fragments, and free-threaded dual-model streaming working together in one runnable app.

Location:examples/chirpui/rag_demo/

What It Demonstrates

Each row is a feature the demo exercises and the page that owns it:

Feature	In the demo	Learn more
Fragments	`Fragment("ask.html", "answer", ...)`renders one named block per token.	Fragments
Server-Sent Events	`EventStream` yields fragments; htmx swaps them into `sse-swap`targets.	Server-Sent Events
Multi-swap SSE layout	Sources, answer, and share link are separate`sse-swap`targets in one stream.	SSE patterns
Dual streaming	Compare two models side by side; each streams independently across worker threads.	Free-threading and thread safety
Typed SQLite	`chirp.data.Database`returns frozen dataclasses for document storage.	Database
Event delegation	`AppConfig(delegation=True)`wires copy and compare controls on SSE-swapped content.	htmx patterns

Run It

Running the demo is an ordered procedure with one prerequisite the model needs — a local Ollama model — before the app can answer anything.

1
Install Chirp with the AI extras
The demo stores docs in SQLite, whichchirp.dataserves through the stdlib sqlite3module — no database extra is needed.
```
pip install chirp[ai,sessions,markdown]
```
2
Pull the default Ollama model
The demo uses Ollama by default, so it needs no API key.
```
ollama pull llama3.2
```
3
Start Ollama in another terminal
```
ollama serve
```
4
Run the app
```
PYTHONPATH=src python examples/chirpui/rag_demo/app.py
```
It starts four worker threads when pounceis installed, and falls back to a single-worker dev server otherwise.
5
Open the browser
Openhttp://127.0.0.1:8000and ask a question about the docs.

To use a cloud model instead, setCHIRP_LLM(for example CHIRP_LLM=anthropic:claude-sonnet-4-20250514) and the matching API key such as ANTHROPIC_API_KEY.

Source: examples/chirpui/rag_demo/app.py.

The Streaming Endpoint

The SSE handler retrieves docs, builds a prompt, and streams the answer token-by-token.stream_with_sourcesre-renders the named blocks as the model emits text and yields oneFragmentper chunk:

from chirp import EventStream, Request, SSEEvent
from chirp.ai.streaming import stream_with_sources


@app.route("/ask/stream", referenced=True, template="ask.html")
async def ask_stream(request: Request) -> EventStream:
    async def generate():
        question = (request.query.get("question") or "").strip()
        sources = await _retrieve_docs(_db_var.get(), question)
        async for frag in stream_with_sources(
            llm.stream(prompt),
            "ask.html",
            sources_block="sources",
            sources=sources,
            response_block="answer",
        ):
            yield frag
        yield SSEEvent(event="done", data="complete")

    return EventStream(generate())

This is an excerpt — prompt, _retrieve_docs, and the per-worker _db_varare defined in the full app./ask/stream and /share/{slug}are marked referenced=Trueso the route contract does not flag them as orphans — htmx connects to them rather than a browser navigating directly.

Chirp Macros

Chirp ships a reusable answer macro so you don't hand-write the body, prose, and copy-button structure for the streamed answer:

Importsse_answer from chirp/sse_answer.htmlfor the standard answer structure. It renders the.answer-body wrapper (with data-copy-text), the .answer-content.prose content, and a .copy-btn.

{% from "chirp/sse_answer.html" import sse_answer %}
{{ sse_answer(text, text | markdown | cite(sources) | safe(reason="patitas")) }}

cite is an app-local filter defined in this demo (@app.template_filter("cite")) that turns[1], [2]references into links — it does not ship with Chirp. The macro suits the final answer; the RAG demo uses its own block for the in-progress streaming states.

Next Steps

SSE patterns — multi-swap layout andhx-target
SSE example — the smaller, single-feature version
Database — typed SQLite queries

Overview

What It Demonstrates

Run It

Install Chirp with the AI extras

Pull the default Ollama model

Start Ollama in another terminal

Run the app

Open the browser

The Streaming Endpoint

Chirp Macros

Next Steps