Testing Strategy
Backend stack
- pytest + pytest-asyncio (
asyncio_mode = auto)
- pytest-mock —
mocker fixture
- httpx —
TestClient for FastAPI routes
- asgi-lifespan — lifespan management in integration tests
Web client stack
- Vitest + @vue/test-utils — Vue 3 component and composable testing
- happy-dom — lightweight DOM environment
- Pinia testing utils —
@pinia/testing for store mocks
Directory layout
tests/ # Backend (pytest)
├── conftest.py
├── conftest_factory.py
├── unit/
│ ├── api/
│ │ └── test_session_files.py # upload/download file endpoint logic
│ ├── core/
│ │ ├── test_events.py # 17 tests
│ │ ├── test_context_builder.py
│ │ ├── test_compressor.py
│ │ ├── test_registry.py # registries, backend discovery, context provider registry
│ │ ├── test_planning.py
│ │ ├── test_agent_context_size.py
│ │ └── test_agent_stream_guard.py
│ ├── llm/
│ │ └── test_ollama.py # timeout/error classification + fallback timeout wiring
│ ├── memory/
│ │ ├── test_store.py
│ │ └── test_extractor.py
│ ├── tools/
│ │ ├── test_filesystem.py
│ │ ├── test_code_exec.py
│ │ ├── test_terminal.py
│ │ ├── test_share_file.py
│ │ └── test_content_publish.py
│ ├── profiles/
│ │ └── test_base.py
│ ├── config/
│ └── test_settings.py
│ ├── test_content_store.py
│ └── test_startup.py
├── integration/
│ ├── conftest.py
│ ├── test_api_routes.py
│ └── test_websocket.py
└── e2e/
└── test_chat_flow.py
webclient/tests/ # Web client (Vitest)
├── unit/
│ ├── api/
│ │ └── index.test.js # 8 tests — request helper, verbs, errors, FormData
│ ├── stores/
│ │ ├── chat.test.js # 23 tests — buildMessageList, WS handlers, session load
│ │ ├── sessions.test.js # 6 tests — fetch, create, delete, pin sorting
│ │ └── profiles.test.js # 3 tests — fetch, selection, lookup
│ └── composables/
│ └── useWebSocket.test.js # 7 tests — connect, dispatch, reconnect
Mock strategy
LLM — FakeLLMBackend
FakeLLMBackend cycles through a list of pre-defined responses and optionally emits ToolCallRequest objects. This lets us test the agent loop and planning without real Ollama.
from tests.conftest_factory import FakeLLMBackend
backend = FakeLLMBackend(
responses=["Hello", "DIRECT"],
tool_calls=[None, None],
thinking=["Hmm", None],
)
resp = await backend.complete([]) # → LLMResponse(content="Hello")
PostgreSQL — FakePool / FakeConnection
Unit tests mock asyncpg.Pool via an in-memory FakePool/FakeConnection. Integration tests may use a real Postgres instance via TEST_DATABASE_URL.
from tests.conftest_factory import FakeConnection, FakeRecord, make_store_with_pool
conn = FakeConnection()
conn.enqueue(42) # fetchval result
conn.enqueue([FakeRecord(id="1", key="name", value="Eugene")]) # fetch result
store = make_store_with_pool(conn)
Coverage status
| Phase |
Module |
Tests |
Status |
| 1 |
navi.core.events |
17 |
✅ Done |
| 1 |
navi.core.compressor |
14 |
✅ Done |
| 1 |
navi.core.registry + ContextProviderRegistry |
13 |
✅ Done |
| 1 |
navi.core.context_builder |
9 |
✅ Done |
| 1 |
navi.profiles.base |
9 |
✅ Done |
| 2 |
navi.memory.store |
18 |
✅ Done |
| 2 |
navi.memory.extractor |
11 |
✅ Done |
| 3 |
navi.api.routes |
19 |
✅ Done |
| 3 |
navi.api.routes.sessions file endpoint logic |
5 |
✅ Basic |
| 3 |
navi.api.websocket |
7 |
✅ Done |
| 3 |
navi.main startup ordering |
1 |
✅ Basic |
| 4 |
navi.core.agent |
9 |
✅ Done |
| 4 |
navi.core.planning |
5 |
✅ Done |
| 5 |
navi.tools.filesystem |
13 |
✅ Done |
| 5 |
navi.tools.code_exec |
5 |
✅ Done |
| 5 |
navi.tools.terminal |
4 |
✅ Done |
| 5 |
navi.tools.share_file |
5 |
✅ Basic |
| 5 |
navi.tools.content_publish |
4 |
✅ Basic |
| 5 |
navi.content_store |
5 |
✅ Basic |
| 5 |
navi.llm.ollama + fallback timeout wiring |
3 |
✅ Basic |
| 6 |
webclient/api |
8 |
✅ Done |
| 6 |
webclient/stores/chat |
23 |
✅ Done |
| 6 |
webclient/stores/sessions |
6 |
✅ Done |
| 6 |
webclient/stores/profiles |
3 |
✅ Done |
| 6 |
webclient/composables/useWebSocket |
7 |
✅ Done |
Coverage roadmap
This is the living plan for what still needs tests. Keep it updated whenever a bug is fixed, a new module is added, or a planned area becomes covered.
Status meanings:
- ✅ Covered enough for current risk
- 🟨 Basic coverage exists, important edge cases remain
- ⬜ Not covered yet
- 🔴 Regression target from a real bug
Phase 7 — Recent Regression Coverage
| Priority |
Area |
Target tests |
Status |
| P0 |
navi.content_store.ensure_tables() |
creates session_content, creates idx_session_content_file, is idempotent when index already exists |
✅ |
| P0 |
navi.content_store.publish() |
repeated publish of same (session_id, filename) updates one row instead of creating duplicates |
✅ 🔴 |
| P0 |
navi.main startup |
registries are initialized before _check_embed() so memory has an embedding backend |
✅ 🔴 |
| P0 |
navi.core.registry._discover_backends() |
primary Ollama backend receives HTTP timeout >= LLM_COMPLETE_TIMEOUT and LLM_STREAM_FIRST_CHUNK_TIMEOUT |
✅ 🔴 |
| P0 |
navi.llm.fallback.FallbackOllamaBackend |
per-server OllamaBackend clients receive the same expanded timeout |
✅ 🔴 |
| P1 |
navi.tools.content_publish |
missing file, directory instead of file, successful publish metadata, filename path stripping |
✅ |
| P1 |
navi.tools.share_file |
duplicate filename collision produces numbered output without overwrite |
✅ |
| P1 |
navi.api.routes.sessions file endpoints |
upload duplicate naming, forbidden extension, download path traversal, content disposition |
✅ |
Phase 8 — Agent Loop Behavior
| Priority |
Area |
Target tests |
Status |
| P0 |
Agent.run_stream() planning entry |
first user message forces planning; later messages follow planning_enabled |
⬜ |
| P0 |
plan → todo bridge |
numbered plan steps auto-populate todo exactly once per plan |
⬜ |
| P0 |
stop handling |
stop during stream prefill yields StreamStopped and closes LLM generator |
🟨 |
| P1 |
subagent forwarding |
parent forwards subagent tool events and counts subagent tokens/tool calls |
⬜ |
| P1 |
adaptive replan |
newly failed todo step injects replan prompt on next iteration |
⬜ |
| P1 |
anti-stall |
repeated tool calls or no todo progress inject warning after threshold |
⬜ |
| P1 |
workers |
post-turn workers run after StreamEnd; worker failure is logged and non-fatal |
⬜ |
Phase 9 — Memory And Embeddings
| Priority |
Area |
Target tests |
Status |
| P0 |
embedding backend wiring |
get_registries() wires dedicated EMBEDDING_OLLAMA_HOST backend into MemoryStore |
⬜ |
| P0 |
pgvector detection |
_has_pgvector() true/false paths and caching behavior |
⬜ |
| P1 |
embedding generation |
invalid/empty/NaN vectors are skipped before PostgreSQL update |
🟨 |
| P1 |
backfill |
backfill_embeddings() batches rows and only updates rows with valid vectors |
🟨 |
| P1 |
search |
vector search falls back to ILIKE when vector search unavailable or empty |
🟨 |
Phase 10 — WebSocket And API Lifecycles
| Priority |
Area |
Target tests |
Status |
| P0 |
active run guard |
duplicate message while a run is active returns run_already_active |
🟨 |
| P0 |
reconnect replay |
reconnect receives missed events and session sync after finished run |
🟨 |
| P1 |
stop endpoint |
POST /sessions/{id}/stop sets stop event and is idempotent |
🟨 |
| P1 |
malformed input |
oversize images, invalid file refs, and non-string payloads are rejected or sanitized |
⬜ |
| P1 |
startup cleanup |
session file cleanup task is started once and deletes orphaned dirs |
⬜ |
Phase 11 — Frontend
| Priority |
Area |
Target tests |
Status |
| P0 |
ContentCard.vue |
renders download links and inline viewer links; handles encoded filenames |
⬜ |
| P0 |
streaming chat |
auto-scroll and streaming message updates stay reactive |
🟨 |
| P1 |
session switching |
concurrent session loads cannot overwrite active session with stale response |
🟨 |
| P1 |
error surfaces |
API/store failures show recoverable UI state, no unhandled rejection |
🟨 |
| P1 |
file upload UI |
upload success/failure, duplicate names, large-file errors |
⬜ |
Running tests
# Backend tests
pytest # all backend tests
pytest tests/unit # unit only
pytest -v tests/unit/core # verbose
pytest -v tests/unit/core/test_events.py::TestToolStarted::test_to_wire # single test
TEST_DATABASE_URL=postgresql://... pytest tests/integration
# Web client tests (run from webclient/)
cd webclient && npm test # all webclient tests
npx vitest run tests/unit/api # single directory
npx vitest run -t "buildMessageList" # filter by test name
Adding a new test
- Create file in the appropriate
tests/unit/ or tests/integration/ directory.
- Use
async def for async tests — pytest-asyncio handles the rest.
- Import helpers from
tests.conftest_factory for fakes.
- Mutations to
navi.config.settings are reset automatically by the autouse fixture in conftest.py.
Guidelines
- Mock at boundaries: LLM calls →
FakeLLMBackend, DB → FakePool, filesystem → tmp_path.
- Avoid real network: Never hit Ollama, OpenAI, or DuckDuckGo in unit tests.
- Avoid real DB in unit tests: Use in-memory mocks; real Postgres only in
tests/integration/.
- Keep tests deterministic: No randomness, no time-dependent logic without monkeypatching
datetime.now.