Newer
Older
navi-1 / docs / testing.md
@Eugene Sukhodolskiy Eugene Sukhodolskiy on 29 Apr 10 KB Complete phase 7 regression test coverage

Testing Strategy

Backend stack

  • pytest + pytest-asyncio (asyncio_mode = auto)
  • pytest-mockmocker fixture
  • httpxTestClient for FastAPI routes
  • asgi-lifespan — lifespan management in integration tests

Web client stack

  • Vitest + @vue/test-utils — Vue 3 component and composable testing
  • happy-dom — lightweight DOM environment
  • Pinia testing utils@pinia/testing for store mocks

Directory layout

tests/                          # Backend (pytest)
├── conftest.py
├── conftest_factory.py
├── unit/
│   ├── api/
│   │   └── test_session_files.py # upload/download file endpoint logic
│   ├── core/
│   │   ├── test_events.py         # 17 tests
│   │   ├── test_context_builder.py
│   │   ├── test_compressor.py
│   │   ├── test_registry.py       # registries, backend discovery, context provider registry
│   │   ├── test_planning.py
│   │   ├── test_agent_context_size.py
│   │   └── test_agent_stream_guard.py
│   ├── llm/
│   │   └── test_ollama.py         # timeout/error classification + fallback timeout wiring
│   ├── memory/
│   │   ├── test_store.py
│   │   └── test_extractor.py
│   ├── tools/
│   │   ├── test_filesystem.py
│   │   ├── test_code_exec.py
│   │   ├── test_terminal.py
│   │   ├── test_share_file.py
│   │   └── test_content_publish.py
│   ├── profiles/
│   │   └── test_base.py
│   ├── config/
│       └── test_settings.py
│   ├── test_content_store.py
│   └── test_startup.py
├── integration/
│   ├── conftest.py
│   ├── test_api_routes.py
│   └── test_websocket.py
└── e2e/
    └── test_chat_flow.py

webclient/tests/                # Web client (Vitest)
├── unit/
│   ├── api/
│   │   └── index.test.js          # 8 tests — request helper, verbs, errors, FormData
│   ├── stores/
│   │   ├── chat.test.js           # 23 tests — buildMessageList, WS handlers, session load
│   │   ├── sessions.test.js       # 6 tests — fetch, create, delete, pin sorting
│   │   └── profiles.test.js       # 3 tests — fetch, selection, lookup
│   └── composables/
│       └── useWebSocket.test.js   # 7 tests — connect, dispatch, reconnect

Mock strategy

LLM — FakeLLMBackend

FakeLLMBackend cycles through a list of pre-defined responses and optionally emits ToolCallRequest objects. This lets us test the agent loop and planning without real Ollama.

from tests.conftest_factory import FakeLLMBackend

backend = FakeLLMBackend(
    responses=["Hello", "DIRECT"],
    tool_calls=[None, None],
    thinking=["Hmm", None],
)
resp = await backend.complete([])  # → LLMResponse(content="Hello")

PostgreSQL — FakePool / FakeConnection

Unit tests mock asyncpg.Pool via an in-memory FakePool/FakeConnection. Integration tests may use a real Postgres instance via TEST_DATABASE_URL.

from tests.conftest_factory import FakeConnection, FakeRecord, make_store_with_pool

conn = FakeConnection()
conn.enqueue(42)  # fetchval result
conn.enqueue([FakeRecord(id="1", key="name", value="Eugene")])  # fetch result
store = make_store_with_pool(conn)

Coverage status

Phase Module Tests Status
1 navi.core.events 17 ✅ Done
1 navi.core.compressor 14 ✅ Done
1 navi.core.registry + ContextProviderRegistry 13 ✅ Done
1 navi.core.context_builder 9 ✅ Done
1 navi.profiles.base 9 ✅ Done
2 navi.memory.store 18 ✅ Done
2 navi.memory.extractor 11 ✅ Done
3 navi.api.routes 19 ✅ Done
3 navi.api.routes.sessions file endpoint logic 5 ✅ Basic
3 navi.api.websocket 7 ✅ Done
3 navi.main startup ordering 1 ✅ Basic
4 navi.core.agent 9 ✅ Done
4 navi.core.planning 5 ✅ Done
5 navi.tools.filesystem 13 ✅ Done
5 navi.tools.code_exec 5 ✅ Done
5 navi.tools.terminal 4 ✅ Done
5 navi.tools.share_file 5 ✅ Basic
5 navi.tools.content_publish 4 ✅ Basic
5 navi.content_store 5 ✅ Basic
5 navi.llm.ollama + fallback timeout wiring 3 ✅ Basic
6 webclient/api 8 ✅ Done
6 webclient/stores/chat 23 ✅ Done
6 webclient/stores/sessions 6 ✅ Done
6 webclient/stores/profiles 3 ✅ Done
6 webclient/composables/useWebSocket 7 ✅ Done

Coverage roadmap

This is the living plan for what still needs tests. Keep it updated whenever a bug is fixed, a new module is added, or a planned area becomes covered.

Status meanings:

  • ✅ Covered enough for current risk
  • 🟨 Basic coverage exists, important edge cases remain
  • ⬜ Not covered yet
  • 🔴 Regression target from a real bug

Phase 7 — Recent Regression Coverage

Priority Area Target tests Status
P0 navi.content_store.ensure_tables() creates session_content, creates idx_session_content_file, is idempotent when index already exists
P0 navi.content_store.publish() repeated publish of same (session_id, filename) updates one row instead of creating duplicates ✅ 🔴
P0 navi.main startup registries are initialized before _check_embed() so memory has an embedding backend ✅ 🔴
P0 navi.core.registry._discover_backends() primary Ollama backend receives HTTP timeout >= LLM_COMPLETE_TIMEOUT and LLM_STREAM_FIRST_CHUNK_TIMEOUT ✅ 🔴
P0 navi.llm.fallback.FallbackOllamaBackend per-server OllamaBackend clients receive the same expanded timeout ✅ 🔴
P1 navi.tools.content_publish missing file, directory instead of file, successful publish metadata, filename path stripping
P1 navi.tools.share_file duplicate filename collision produces numbered output without overwrite
P1 navi.api.routes.sessions file endpoints upload duplicate naming, forbidden extension, download path traversal, content disposition

Phase 8 — Agent Loop Behavior

Priority Area Target tests Status
P0 Agent.run_stream() planning entry first user message forces planning; later messages follow planning_enabled
P0 plan → todo bridge numbered plan steps auto-populate todo exactly once per plan
P0 stop handling stop during stream prefill yields StreamStopped and closes LLM generator 🟨
P1 subagent forwarding parent forwards subagent tool events and counts subagent tokens/tool calls
P1 adaptive replan newly failed todo step injects replan prompt on next iteration
P1 anti-stall repeated tool calls or no todo progress inject warning after threshold
P1 workers post-turn workers run after StreamEnd; worker failure is logged and non-fatal

Phase 9 — Memory And Embeddings

Priority Area Target tests Status
P0 embedding backend wiring get_registries() wires dedicated EMBEDDING_OLLAMA_HOST backend into MemoryStore
P0 pgvector detection _has_pgvector() true/false paths and caching behavior
P1 embedding generation invalid/empty/NaN vectors are skipped before PostgreSQL update 🟨
P1 backfill backfill_embeddings() batches rows and only updates rows with valid vectors 🟨
P1 search vector search falls back to ILIKE when vector search unavailable or empty 🟨

Phase 10 — WebSocket And API Lifecycles

Priority Area Target tests Status
P0 active run guard duplicate message while a run is active returns run_already_active 🟨
P0 reconnect replay reconnect receives missed events and session sync after finished run 🟨
P1 stop endpoint POST /sessions/{id}/stop sets stop event and is idempotent 🟨
P1 malformed input oversize images, invalid file refs, and non-string payloads are rejected or sanitized
P1 startup cleanup session file cleanup task is started once and deletes orphaned dirs

Phase 11 — Frontend

Priority Area Target tests Status
P0 ContentCard.vue renders download links and inline viewer links; handles encoded filenames
P0 streaming chat auto-scroll and streaming message updates stay reactive 🟨
P1 session switching concurrent session loads cannot overwrite active session with stale response 🟨
P1 error surfaces API/store failures show recoverable UI state, no unhandled rejection 🟨
P1 file upload UI upload success/failure, duplicate names, large-file errors

Running tests

# Backend tests
pytest                              # all backend tests
pytest tests/unit                   # unit only
pytest -v tests/unit/core           # verbose
pytest -v tests/unit/core/test_events.py::TestToolStarted::test_to_wire  # single test
TEST_DATABASE_URL=postgresql://... pytest tests/integration

# Web client tests (run from webclient/)
cd webclient && npm test             # all webclient tests
npx vitest run tests/unit/api       # single directory
npx vitest run -t "buildMessageList" # filter by test name

Adding a new test

  1. Create file in the appropriate tests/unit/ or tests/integration/ directory.
  2. Use async def for async tests — pytest-asyncio handles the rest.
  3. Import helpers from tests.conftest_factory for fakes.
  4. Mutations to navi.config.settings are reset automatically by the autouse fixture in conftest.py.

Guidelines

  • Mock at boundaries: LLM calls → FakeLLMBackend, DB → FakePool, filesystem → tmp_path.
  • Avoid real network: Never hit Ollama, OpenAI, or DuckDuckGo in unit tests.
  • Avoid real DB in unit tests: Use in-memory mocks; real Postgres only in tests/integration/.
  • Keep tests deterministic: No randomness, no time-dependent logic without monkeypatching datetime.now.