Testing Strategy

Backend stack

pytest + pytest-asyncio (asyncio_mode = auto)
pytest-mock — mocker fixture
httpx — TestClient for FastAPI routes
asgi-lifespan — lifespan management in integration tests

Web client stack

Vitest + @vue/test-utils — Vue 3 component and composable testing
happy-dom — lightweight DOM environment
Pinia testing utils — @pinia/testing for store mocks

Directory layout

tests/                          # Backend (pytest)
├── conftest.py
├── conftest_factory.py
├── unit/
│   ├── api/
│   │   └── test_session_files.py # upload/download file endpoint logic
│   ├── core/
│   │   ├── test_events.py         # 17 tests
│   │   ├── test_context_builder.py
│   │   ├── test_compressor.py
│   │   ├── test_registry.py       # registries, backend discovery, context provider registry
│   │   ├── test_planning.py
│   │   ├── test_agent_context_size.py
│   │   └── test_agent_stream_guard.py
│   ├── llm/
│   │   └── test_ollama.py         # timeout/error classification + fallback timeout wiring
│   ├── memory/
│   │   ├── test_store.py
│   │   └── test_extractor.py
│   ├── tools/
│   │   ├── test_filesystem.py
│   │   ├── test_code_exec.py
│   │   ├── test_terminal.py
│   │   ├── test_share_file.py
│   │   └── test_content_publish.py
│   ├── profiles/
│   │   └── test_base.py
│   ├── config/
│       └── test_settings.py
│   ├── test_content_store.py
│   └── test_startup.py
├── integration/
│   ├── conftest.py
│   ├── test_api_routes.py
│   └── test_websocket.py
└── e2e/
    └── test_chat_flow.py

webclient/tests/                # Web client (Vitest)
├── unit/
│   ├── api/
│   │   └── index.test.js          # 8 tests — request helper, verbs, errors, FormData
│   ├── stores/
│   │   ├── chat.test.js           # 23 tests — buildMessageList, WS handlers, session load
│   │   ├── sessions.test.js       # 6 tests — fetch, create, delete, pin sorting
│   │   └── profiles.test.js       # 3 tests — fetch, selection, lookup
│   └── composables/
│       └── useWebSocket.test.js   # 7 tests — connect, dispatch, reconnect

Mock strategy

LLM — FakeLLMBackend

FakeLLMBackend cycles through a list of pre-defined responses and optionally emits ToolCallRequest objects. This lets us test the agent loop and planning without real Ollama.

from tests.conftest_factory import FakeLLMBackend

backend = FakeLLMBackend(
    responses=["Hello", "DIRECT"],
    tool_calls=[None, None],
    thinking=["Hmm", None],
)
resp = await backend.complete([])  # → LLMResponse(content="Hello")

PostgreSQL — FakePool / FakeConnection

Unit tests mock asyncpg.Pool via an in-memory FakePool/FakeConnection. Integration tests may use a real Postgres instance via TEST_DATABASE_URL.

from tests.conftest_factory import FakeConnection, FakeRecord, make_store_with_pool

conn = FakeConnection()
conn.enqueue(42)  # fetchval result
conn.enqueue([FakeRecord(id="1", key="name", value="Eugene")])  # fetch result
store = make_store_with_pool(conn)

Coverage status

Phase	Module	Tests	Status
1	`navi.core.events`	17	✅ Done
1	`navi.core.compressor`	14	✅ Done
1	`navi.core.registry` + `ContextProviderRegistry`	13	✅ Done
1	`navi.core.context_builder`	9	✅ Done
1	`navi.profiles.base`	9	✅ Done
2	`navi.memory.store`	18	✅ Done
2	`navi.memory.extractor`	11	✅ Done
3	`navi.api.routes`	19	✅ Done
3	`navi.api.routes.sessions` file endpoint logic	5	✅ Basic
3	`navi.api.websocket`	7	✅ Done
3	`navi.main` startup ordering	1	✅ Basic
4	`navi.core.agent`	9	✅ Done
4	`navi.core.planning`	5	✅ Done
5	`navi.tools.filesystem`	13	✅ Done
5	`navi.tools.code_exec`	5	✅ Done
5	`navi.tools.terminal`	4	✅ Done
5	`navi.tools.share_file`	5	✅ Basic
5	`navi.tools.content_publish`	4	✅ Basic
5	`navi.content_store`	5	✅ Basic
5	`navi.llm.ollama` + fallback timeout wiring	3	✅ Basic
6	`webclient/api`	8	✅ Done
6	`webclient/stores/chat`	23	✅ Done
6	`webclient/stores/sessions`	6	✅ Done
6	`webclient/stores/profiles`	3	✅ Done
6	`webclient/composables/useWebSocket`	7	✅ Done

Coverage roadmap

This is the living plan for what still needs tests. Keep it updated whenever a bug is fixed, a new module is added, or a planned area becomes covered.

Status meanings:

✅ Covered enough for current risk
🟨 Basic coverage exists, important edge cases remain
⬜ Not covered yet
🔴 Regression target from a real bug

Phase 7 — Recent Regression Coverage

Priority	Area	Target tests	Status
P0	`navi.content_store.ensure_tables()`	creates `session_content`, creates `idx_session_content_file`, is idempotent when index already exists	✅
P0	`navi.content_store.publish()`	repeated publish of same `(session_id, filename)` updates one row instead of creating duplicates	✅ 🔴
P0	`navi.main` startup	registries are initialized before `_check_embed()` so memory has an embedding backend	✅ 🔴
P0	`navi.core.registry._discover_backends()`	primary Ollama backend receives HTTP timeout >= `LLM_COMPLETE_TIMEOUT` and `LLM_STREAM_FIRST_CHUNK_TIMEOUT`	✅ 🔴
P0	`navi.llm.fallback.FallbackOllamaBackend`	per-server `OllamaBackend` clients receive the same expanded timeout	✅ 🔴
P1	`navi.tools.content_publish`	missing file, directory instead of file, successful publish metadata, filename path stripping	✅
P1	`navi.tools.share_file`	duplicate filename collision produces numbered output without overwrite	✅
P1	`navi.api.routes.sessions` file endpoints	upload duplicate naming, forbidden extension, download path traversal, content disposition	✅

Phase 8 — Agent Loop Behavior

Priority	Area	Target tests	Status
P0	`Agent.run_stream()` planning entry	first user message forces planning; later messages follow `planning_enabled`	⬜
P0	plan → todo bridge	numbered plan steps auto-populate todo exactly once per plan	⬜
P0	stop handling	stop during stream prefill yields `StreamStopped` and closes LLM generator	🟨
P1	subagent forwarding	parent forwards subagent tool events and counts subagent tokens/tool calls	⬜
P1	adaptive replan	newly failed todo step injects replan prompt on next iteration	⬜
P1	anti-stall	repeated tool calls or no todo progress inject warning after threshold	⬜
P1	workers	post-turn workers run after `StreamEnd`; worker failure is logged and non-fatal	⬜

Phase 9 — Memory And Embeddings

Priority	Area	Target tests	Status
P0	embedding backend wiring	`get_registries()` wires dedicated `EMBEDDING_OLLAMA_HOST` backend into `MemoryStore`	⬜
P0	pgvector detection	`_has_pgvector()` true/false paths and caching behavior	⬜
P1	embedding generation	invalid/empty/NaN vectors are skipped before PostgreSQL update	🟨
P1	backfill	`backfill_embeddings()` batches rows and only updates rows with valid vectors	🟨
P1	search	vector search falls back to ILIKE when vector search unavailable or empty	🟨

Phase 10 — WebSocket And API Lifecycles

Priority	Area	Target tests	Status
P0	active run guard	duplicate message while a run is active returns `run_already_active`	🟨
P0	reconnect replay	reconnect receives missed events and session sync after finished run	🟨
P1	stop endpoint	`POST /sessions/{id}/stop` sets stop event and is idempotent	🟨
P1	malformed input	oversize images, invalid file refs, and non-string payloads are rejected or sanitized	⬜
P1	startup cleanup	session file cleanup task is started once and deletes orphaned dirs	⬜

Phase 11 — Frontend

Priority	Area	Target tests	Status
P0	`ContentCard.vue`	renders download links and inline viewer links; handles encoded filenames	⬜
P0	streaming chat	auto-scroll and streaming message updates stay reactive	🟨
P1	session switching	concurrent session loads cannot overwrite active session with stale response	🟨
P1	error surfaces	API/store failures show recoverable UI state, no unhandled rejection	🟨
P1	file upload UI	upload success/failure, duplicate names, large-file errors	⬜

Running tests

# Backend tests
pytest                              # all backend tests
pytest tests/unit                   # unit only
pytest -v tests/unit/core           # verbose
pytest -v tests/unit/core/test_events.py::TestToolStarted::test_to_wire  # single test
TEST_DATABASE_URL=postgresql://... pytest tests/integration

# Web client tests (run from webclient/)
cd webclient && npm test             # all webclient tests
npx vitest run tests/unit/api       # single directory
npx vitest run -t "buildMessageList" # filter by test name

Adding a new test

Create file in the appropriate tests/unit/ or tests/integration/ directory.
Use async def for async tests — pytest-asyncio handles the rest.
Import helpers from tests.conftest_factory for fakes.
Mutations to navi.config.settings are reset automatically by the autouse fixture in conftest.py.

Guidelines

Mock at boundaries: LLM calls → FakeLLMBackend, DB → FakePool, filesystem → tmp_path.
Avoid real network: Never hit Ollama, OpenAI, or DuckDuckGo in unit tests.
Avoid real DB in unit tests: Use in-memory mocks; real Postgres only in tests/integration/.
Keep tests deterministic: No randomness, no time-dependent logic without monkeypatching datetime.now.