Testing - Milo

Milo ships with testing utilities purpose-built for the Elm Architecture. Test reducers with action sequences, test views with snapshots, test sagas step-by-step, and record/replay entire sessions.

CLI command testing

Agent-facing CLIs should test the command contract before interactive behavior. Use four small layers:

Schema —function_to_schema(command)matches the function signature.
Direct dispatch —cli.invoke([...]) parses argv and returns the expected InvokeResult.
MCP dispatch —_call_tool(cli, {...}) returns the same content and structured errorDataon malformed input.
Verify —milo.verify.verify("app.py") passes the same import, discovery, schema, tools/list, and subprocess handshake checks as milo verify.

from pathlib import Path

from milo.mcp import _call_tool
from milo.schema import function_to_schema
from milo.verify import verify

from app import cli, greet


def test_schema_matches_signature():
    schema = function_to_schema(greet)
    assert schema["required"] == ["name"]
    assert schema["properties"]["name"]["type"] == "string"


def test_direct_dispatch():
    result = cli.invoke(["greet", "--name", "Alice"])
    assert result.exit_code == 0
    assert "Hello, Alice!" in result.output


def test_mcp_dispatch():
    result = _call_tool(cli, {"name": "greet", "arguments": {"name": "Agent"}})
    assert result["content"][0]["text"] == "Hello, Agent!"
    assert "isError" not in result


def test_milo_verify_passes():
    report = verify(str(Path(__file__).resolve().parents[1] / "app.py"))
    assert report.exit_code == 0, report.format()

Scaffolded projects from milo new include these layers in tests/test_app.py. Use the reducer, render, saga, and replay helpers below for interactive app behavior.

Docs and example drift

Milo's own docs use tagged Markdown fences for snippets that should keep working. Only fences marked withmilo-docs:*are checked, so long-running MCP servers and external registration commands can stay documented without running in CI.

make docs-test

That target compiles built-in and example Kida templates, then runs scripts/check_docs_snippets.pyagainst the repo docs. Use:

milo-docs:runfor deterministic shell snippets.
milo-docs:compilefor Python or Kida snippets that should parse.
milo-docs:skip reason=<why>for commands that must remain visible but should not run automatically.

Testing strategies

flowchart TB subgraph Unit RS[assert_state] --> Reducer RN[assert_renders] --> Template SG[assert_saga] --> Saga end subgraph Integration Rec[recording_middleware] --> Session[JSONL Session] Session --> Replay[milo replay] end Replay -->|"--assert"| CI[CI Regression]

Snapshot testing

Render state through a template and compare to a snapshot file:

from milo.testing import assert_renders

assert_renders(
    {"count": 5, "label": "Total"},
    "counter.kida",
    snapshot="tests/snapshots/counter_5.txt",
)

On first run, the snapshot file is created. On subsequent runs, the output is compared to the stored snapshot. ANSI codes are stripped by default.

MILO_UPDATE_SNAPSHOTS=1 pytest

pytest  # Fails if snapshots don't match

Reducer testing

Feed an action sequence through a reducer and assert the final state:

from milo.testing import assert_state
from milo import Action

assert_state(
    reducer,
    None,  # initial state
    [Action("@@INIT"), Action("INCREMENT"), Action("INCREMENT")],
    {"count": 2},  # expected final state
)

Note

assert_statereplays actions synchronously — no event loop, no rendering, no sagas. This isolates the reducer logic for fast, deterministic tests.

Saga testing

Step through a saga generator, asserting each yielded effect:

from milo.testing import assert_saga
from milo import Call, Put, Action

assert_saga(
    fetch_saga(),
    [
        (Call(fetch_json, ("https://api.example.com",), {}), {"data": 42}),
        (Put(Action("DATA_LOADED", payload={"data": 42})), None),
    ],
)

Each tuple is (expected_effect, value_to_send_back). The test runner asserts the yielded effect matches, then sends the value back into the generator.

Session recording

Record every action dispatched during an interactive session:

app = App(template="app.kida", reducer=reducer,
          initial_state=None, record=True)
app.run()  # Writes to session.jsonl

app = App(template="app.kida", reducer=reducer,
          initial_state=None, record="my_session.jsonl")
app.run()

The recording middleware captures each action with a state hash in JSONL format.

Session replay

Replay a recorded session for debugging or CI regression testing:

# Normal replay at 2x speed
milo replay session.jsonl --speed 2.0

# Show state diffs between actions
milo replay session.jsonl --diff

# Step-by-step interactive replay
milo replay session.jsonl --step

# CI mode: assert state hashes match
milo replay session.jsonl --assert --reducer myapp:reducer

Warning

The--assertflag compares state hashes at each step against the recorded values. If you change your reducer logic, recorded sessions will fail hash checks — re-record affected sessions after intentional changes.

Sessions are stored as JSONL (one JSON object per line):

{"seq": 0, "type": "@@INIT", "payload": null, "hash": "a1b2c3d4"}
{"seq": 1, "type": "@@KEY", "payload": {"char": " ", "name": null, "ctrl": false}, "hash": "e5f6g7h8"}
{"seq": 2, "type": "@@KEY", "payload": {"char": "r", "name": null, "ctrl": false}, "hash": "i9j0k1l2"}

Each line includes a sequence number, the action, and a hash of the resulting state. This makes recordings both human-readable and machine-verifiable.