The Vertical Stack Thesis — Why I Build Every Layer

Why I own the full stack from syntax highlighting to web framework — and how each layer multiplies the value of the layers below.

March 06, 2026 7 min read

lbliii

People ask why I wrote my own template engine, my own markdown parser, my own syntax highlighter, my own ASGI server, my own web framework, and my own static site generator. There are existing options for every one of those.

The short answer is not "because I can." It is that the stack is the product, and the product only works if the pieces fit.

This post is the high-level version of that argument. If the b-stack looks excessive from the outside, this is why it does not feel excessive from the inside.

The problem with assembled stacks

Most Python web projects are assembled from parts that do not know about each other. Flask does not know about Jinja2's internals. Jinja2 does not know about Pygments. Uvicorn does not know about your framework's concurrency model.

That usually works, but it works by treating integration friction as normal. You glue the pieces together with configuration, hope the abstractions do not leak too badly, and absorb the cost when they do.

The friction compounds. A template engine designed for single-threaded Django cannot give you lock-free rendering. A markdown parser that shells out to Pygments for highlighting cannot give you O(n) guarantees. An ASGI server that forks processes cannot share immutable config across workers.

Each layer's design constraints leak upward.

When you own the vertical stack, you don't inherit those constraints. You design each layer knowing exactly what the layers above and below need.

The dependency graph

  Purr ──── content-reactive runtime
   ├── Bengal ──── static site generator
   ├── Chirp ───── web framework
   └── Chirp UI ── component library
        ├── Kida ────── template engine
        ├── Patitas ─── markdown parser
        │    └── Rosettes ── syntax highlighter
        ├── Pounce ───── ASGI server
        └── Zoomies ──── QUIC / HTTP/3

The foundation layer, Kida through Zoomies, has almost no external dependencies. Kida and Rosettes are pure Python with zero third-party imports. Patitas depends only on PyYAML. Zoomies depends only on cryptography.

That means the entire foundation layer pulls in two external packages.

That's not accidental. External dependencies are design constraints you can't control. Minimizing them at the foundation means the stack's behavior is predictable from bottom to top.

How the layers multiply value

The real payoff of vertical ownership is not that each library works well alone. It is that each layer unlocks capabilities in the layers above that would be difficult or impossible with unrelated parts.

Rosettes → Patitas: O(n) highlighting inside the parser

Most Markdown parsers delegate syntax highlighting to Pygments, which uses regex-based lexers. Regex has familiar strengths, but in this context it also brings catastrophic backtracking, ReDoS risk, and unpredictable worst-case performance.

Rosettes uses hand-written state machines instead. Every lexer is O(n) — linear in the input, no backtracking, no ReDoS. Because Patitas owns the integration with Rosettes, highlighting happens inside the parse step:

from patitas import render

html = render("""
```python
def fibonacci(n: int) -> int:
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a
``‍`
""")

One call: parse the document, identify the fenced code block, highlight it with Rosettes' O(n) lexer, and return HTML. No subprocess. No regex engine in the hot path. No second pass.

That only works this cleanly because Patitas knows Rosettes' API and Rosettes was built to be embedded this way.

Patitas → Bengal: immutable AST enables parallel parsing

Patitas returns an immutable document AST:

@dataclass(frozen=True, slots=True)
class Document:
    children: tuple[Node, ...]
    metadata: Metadata

Because the AST is frozen, Bengal can parse hundreds of Markdown files in parallel on free-threaded Python. Each worker thread gets its own lexer and parser, with no shared mutable state between parses:

with ThreadPoolExecutor(max_workers=workers) as pool:
    futures = {
        pool.submit(ctx.run, parse_page, page): page
        for page in pages
    }

If Patitas returned mutable trees, Bengal would need locks around every parse. The immutability decision in Patitas enables the parallelism in Bengal. That is not a feature you bolt on later. It has to be part of the design.

Kida → Chirp: block rendering enables type-driven fragments

Kida compiles templates to Python AST and supports rendering individual named blocks — not just full templates. This is what makes Chirp's Fragment and Page return types possible:

@app.route("/tasks", methods=["POST"])
def create_task(request: Request):
    task = store.add(request.form["title"])
    return Fragment("tasks.html", "task_item", task=task)

Under the hood, Chirp asks Kida to render just the task_item block from tasks.html. Jinja2 does not expose render_block() as a first-class API. Kida has it because Chirp needs it. Chirp has Fragment because Kida supports it.

The feature lives between the layers, not in either one alone.

Kida → Bengal: static analysis for incremental builds

Kida can tell you what a template needs before you render it. Bengal uses that for incremental builds: cache site-scoped blocks once, then re-render only page-scoped blocks when content changes.

That is where the 40-60% rebuild reduction comes from. Without the analysis, you either re-render everything or guess.

Jinja2 can't do this. It's render-only. The capability is a direct consequence of compiling to AST.

Pounce → Chirp: thread-based workers with shared config

Pounce runs ASGI workers as threads, not processes. On free-threaded Python, this means real parallelism with shared memory — all workers see the same immutable application config without IPC:

app = App(config=AppConfig(template_dir="templates"))

@app.route("/")
def index():
    return Page("index.html", "content", title="Home")

# After this, app is frozen. Pounce threads share it safely.
app.run()

Uvicorn would fork this into separate processes, each with its own copy of the app in memory. Pounce shares one copy across N threads. Memory savings scale with worker count, and config access needs no IPC.

The flywheel

Owning the stack creates a development cycle that does not exist when you are a consumer of other people's libraries. When I hit a limitation in any layer, I fix it at the source, usually in the same session.

No upstream PRs. No waiting for a maintainer who may not share your priorities. No workarounds.

This is not faster than using established libraries for the first feature. It is faster for the hundredth.

The ecosystem is the test suite

Bengal's 250-page documentation site is a more rigorous test of Kida than any isolated unit test battery. It exercises template inheritance, block rendering, streaming, filters, and caching across hundreds of real pages with real content.

Chirp's 32 example applications — from a 31-line GET form to a 507-line kanban board — exercise content negotiation, SSE streaming, OOB swaps, file-based routing, form validation, and session auth.

Bugs found this way are real bugs, not synthetic edge cases. The workload finds the things the unit tests do not know to ask for.

Warning

Vertical ownership is not the right choice for most projects. It trades breadth of community testing for depth of integrated testing. It makes sense when the layers need to share design assumptions — like free-threading safety — that upstream libraries don't provide. It doesn't make sense when existing tools genuinely solve the problem and the integration friction is low.

The thesis

The web development ecosystem is bloated. A simple Python web application should not require a template engine that does not know about your concurrency model, a Markdown parser that shells out to a regex-based highlighter, and an ASGI server that forks processes because it was designed before nogil existed.

The b-stack thesis is that a vertically integrated, pure-Python stack, designed for free-threaded Python and with each layer aware of the layers above and below, can be leaner, clearer, and faster than an assembled stack of uncoordinated parts.

This blog is built with it. The docs sites are built with it. The proof isn't a benchmark. It's the work.