RAG Demo

Streaming AI Q&A with cited sources — Chirp's broadest single example

Page actions AI-ready formats and sharing
Open LLM text
Share with AI
Ask Claude Ask ChatGPT Ask Gemini Ask Copilot

Overview

The RAG demo is a documentation Q&A app: you type a question, it retrieves the relevant docs from SQLite, and it streams an AI answer with cited sources back over the wire — no React, no npm, around 50 lines of Python. Reach for it to see Server-Sent Events, fragments, and free-threaded dual-model streaming working together in one runnable app.

Location:examples/chirpui/rag_demo/

What It Demonstrates

Each row is a feature the demo exercises and the page that owns it:

Feature In the demo Learn more
Fragments Fragment("ask.html", "answer", ...)renders one named block per token. Fragments
Server-Sent Events EventStream yields fragments; htmx swaps them into sse-swaptargets. Server-Sent Events
Multi-swap SSE layout Sources, answer, and share link are separatesse-swaptargets in one stream. SSE patterns
Dual streaming Compare two models side by side; each streams independently across worker threads. Free-threading and thread safety
Typed SQLite chirp.data.Databasereturns frozen dataclasses for document storage. Database
Event delegation AppConfig(delegation=True)wires copy and compare controls on SSE-swapped content. htmx patterns

Run It

Running the demo is an ordered procedure with one prerequisite the model needs — a local Ollama model — before the app can answer anything.

  1. 1

    Install Chirp with the AI extras

    The demo stores docs in SQLite, whichchirp.dataserves through the stdlib sqlite3module — no database extra is needed.

    pip install chirp[ai,sessions,markdown]
    
  2. 2

    Pull the default Ollama model

    The demo uses Ollama by default, so it needs no API key.

    ollama pull llama3.2
    
  3. 3

    Start Ollama in another terminal

    ollama serve
    
  4. 4

    Run the app

    PYTHONPATH=src python examples/chirpui/rag_demo/app.py
    

    It starts four worker threads when pounceis installed, and falls back to a single-worker dev server otherwise.

  5. 5

    Open the browser

    Openhttp://127.0.0.1:8000and ask a question about the docs.

To use a cloud model instead, setCHIRP_LLM(for example CHIRP_LLM=anthropic:claude-sonnet-4-20250514) and the matching API key such as ANTHROPIC_API_KEY.

Source: examples/chirpui/rag_demo/app.py.

The Streaming Endpoint

The SSE handler retrieves docs, builds a prompt, and streams the answer token-by-token.stream_with_sourcesre-renders the named blocks as the model emits text and yields oneFragmentper chunk:

from chirp import EventStream, Request, SSEEvent
from chirp.ai.streaming import stream_with_sources


@app.route("/ask/stream", referenced=True, template="ask.html")
async def ask_stream(request: Request) -> EventStream:
    async def generate():
        question = (request.query.get("question") or "").strip()
        sources = await _retrieve_docs(_db_var.get(), question)
        async for frag in stream_with_sources(
            llm.stream(prompt),
            "ask.html",
            sources_block="sources",
            sources=sources,
            response_block="answer",
        ):
            yield frag
        yield SSEEvent(event="done", data="complete")

    return EventStream(generate())

This is an excerpt — prompt, _retrieve_docs, and the per-worker _db_varare defined in the full app./ask/stream and /share/{slug}are marked referenced=Trueso the route contract does not flag them as orphans — htmx connects to them rather than a browser navigating directly.

Chirp Macros

Chirp ships a reusable answer macro so you don't hand-write the body, prose, and copy-button structure for the streamed answer:

Next Steps