Milo ships with testing utilities purpose-built for the Elm Architecture. Test reducers with action sequences, test views with snapshots, test sagas step-by-step, and record/replay entire sessions.
CLI command testing
Agent-facing CLIs should test the command contract before interactive behavior. Use four small layers:
- Schema —
function_to_schema(command)matches the function signature. - Direct dispatch —
cli.invoke([...])parses argv and returns the expectedInvokeResult. - MCP dispatch —
_call_tool(cli, {...})returns the same content and structurederrorDataon malformed input. - Verify —
milo.verify.verify("app.py")passes the same import, discovery, schema,tools/list, and subprocess handshake checks asmilo verify.
from pathlib import Path
from milo.mcp import _call_tool
from milo.schema import function_to_schema
from milo.verify import verify
from app import cli, greet
def test_schema_matches_signature():
schema = function_to_schema(greet)
assert schema["required"] == ["name"]
assert schema["properties"]["name"]["type"] == "string"
def test_direct_dispatch():
result = cli.invoke(["greet", "--name", "Alice"])
assert result.exit_code == 0
assert "Hello, Alice!" in result.output
def test_mcp_dispatch():
result = _call_tool(cli, {"name": "greet", "arguments": {"name": "Agent"}})
assert result["content"][0]["text"] == "Hello, Agent!"
assert "isError" not in result
def test_milo_verify_passes():
report = verify(str(Path(__file__).resolve().parents[1] / "app.py"))
assert report.exit_code == 0, report.format()
Scaffolded projects from milo new include these layers in tests/test_app.py.
Use the reducer, render, saga, and replay helpers below for interactive app
behavior.
Docs and example drift
Milo's own docs use tagged Markdown fences for snippets that should keep
working. Only fences marked withmilo-docs:*are checked, so long-running MCP
servers and external registration commands can stay documented without running
in CI.
make docs-test
That target compiles built-in and example Kida templates, then runs
scripts/check_docs_snippets.pyagainst the repo docs. Use:
milo-docs:runfor deterministic shell snippets.milo-docs:compilefor Python or Kida snippets that should parse.milo-docs:skip reason=<why>for commands that must remain visible but should not run automatically.
Testing strategies
Snapshot testing
Render state through a template and compare to a snapshot file:
from milo.testing import assert_renders
assert_renders(
{"count": 5, "label": "Total"},
"counter.kida",
snapshot="tests/snapshots/counter_5.txt",
)
On first run, the snapshot file is created. On subsequent runs, the output is compared to the stored snapshot. ANSI codes are stripped by default.
MILO_UPDATE_SNAPSHOTS=1 pytest
pytest # Fails if snapshots don't match
Reducer testing
Feed an action sequence through a reducer and assert the final state:
from milo.testing import assert_state
from milo import Action
assert_state(
reducer,
None, # initial state
[Action("@@INIT"), Action("INCREMENT"), Action("INCREMENT")],
{"count": 2}, # expected final state
)
Note
assert_statereplays actions synchronously — no event loop, no rendering, no sagas. This isolates the reducer logic for fast, deterministic tests.
Saga testing
Step through a saga generator, asserting each yielded effect:
from milo.testing import assert_saga
from milo import Call, Put, Action
assert_saga(
fetch_saga(),
[
(Call(fetch_json, ("https://api.example.com",), {}), {"data": 42}),
(Put(Action("DATA_LOADED", payload={"data": 42})), None),
],
)
Each tuple is (expected_effect, value_to_send_back). The test runner asserts the yielded effect matches, then sends the value back into the generator.
Session recording
Record every action dispatched during an interactive session:
app = App(template="app.kida", reducer=reducer,
initial_state=None, record=True)
app.run() # Writes to session.jsonl
app = App(template="app.kida", reducer=reducer,
initial_state=None, record="my_session.jsonl")
app.run()
The recording middleware captures each action with a state hash in JSONL format.
Session replay
Replay a recorded session for debugging or CI regression testing:
# Normal replay at 2x speed
milo replay session.jsonl --speed 2.0
# Show state diffs between actions
milo replay session.jsonl --diff
# Step-by-step interactive replay
milo replay session.jsonl --step
# CI mode: assert state hashes match
milo replay session.jsonl --assert --reducer myapp:reducer
Warning
The--assertflag compares state hashes at each step against the recorded values. If you change your reducer logic, recorded sessions will fail hash checks — re-record affected sessions after intentional changes.
Recording format
Sessions are stored as JSONL (one JSON object per line):
{"seq": 0, "type": "@@INIT", "payload": null, "hash": "a1b2c3d4"}
{"seq": 1, "type": "@@KEY", "payload": {"char": " ", "name": null, "ctrl": false}, "hash": "e5f6g7h8"}
{"seq": 2, "type": "@@KEY", "payload": {"char": "r", "name": null, "ctrl": false}, "hash": "i9j0k1l2"}
Each line includes a sequence number, the action, and a hash of the resulting state. This makes recordings both human-readable and machine-verifiable.