# Testing Strategy

## Backend stack
- **pytest** + **pytest-asyncio** (`asyncio_mode = auto`)
- **pytest-mock** — `mocker` fixture
- **httpx** — `TestClient` for FastAPI routes
- **asgi-lifespan** — lifespan management in integration tests

## Web client stack
- **Vitest** + **@vue/test-utils** — Vue 3 component and composable testing
- **happy-dom** — lightweight DOM environment
- **Pinia testing utils** — `@pinia/testing` for store mocks

## Directory layout

```
tests/                          # Backend (pytest)
├── conftest.py
├── conftest_factory.py
├── unit/
│   ├── api/
│   │   └── test_session_files.py # upload/download file endpoint logic
│   ├── core/
│   │   ├── test_events.py         # 17 tests
│   │   ├── test_context_builder.py
│   │   ├── test_compressor.py
│   │   ├── test_registry.py       # registries, backend discovery, context provider registry
│   │   ├── test_planning.py
│   │   ├── test_agent_context_size.py
│   │   └── test_agent_stream_guard.py
│   ├── llm/
│   │   └── test_ollama.py         # timeout/error classification + fallback timeout wiring
│   ├── memory/
│   │   ├── test_store.py
│   │   └── test_extractor.py
│   ├── tools/
│   │   ├── test_filesystem.py
│   │   ├── test_code_exec.py
│   │   ├── test_terminal.py
│   │   ├── test_share_file.py
│   │   └── test_content_publish.py
│   ├── profiles/
│   │   └── test_base.py
│   ├── config/
│       └── test_settings.py
│   ├── test_content_store.py
│   └── test_startup.py
├── integration/
│   ├── conftest.py
│   ├── test_api_routes.py
│   └── test_websocket.py
└── e2e/
    └── test_chat_flow.py

webclient/tests/                # Web client (Vitest)
├── unit/
│   ├── api/
│   │   └── index.test.js          # 8 tests — request helper, verbs, errors, FormData
│   ├── stores/
│   │   ├── chat.test.js           # 23 tests — buildMessageList, WS handlers, session load
│   │   ├── sessions.test.js       # 6 tests — fetch, create, delete, pin sorting
│   │   └── profiles.test.js       # 3 tests — fetch, selection, lookup
│   └── composables/
│       └── useWebSocket.test.js   # 7 tests — connect, dispatch, reconnect
```

## Mock strategy

### LLM — FakeLLMBackend
`FakeLLMBackend` cycles through a list of pre-defined responses and optionally emits `ToolCallRequest` objects. This lets us test the agent loop and planning without real Ollama.

```python
from tests.conftest_factory import FakeLLMBackend

backend = FakeLLMBackend(
    responses=["Hello", "DIRECT"],
    tool_calls=[None, None],
    thinking=["Hmm", None],
)
resp = await backend.complete([])  # → LLMResponse(content="Hello")
```

### PostgreSQL — FakePool / FakeConnection
Unit tests mock `asyncpg.Pool` via an in-memory `FakePool`/`FakeConnection`. Integration tests may use a real Postgres instance via `TEST_DATABASE_URL`.

```python
from tests.conftest_factory import FakeConnection, FakeRecord, make_store_with_pool

conn = FakeConnection()
conn.enqueue(42)  # fetchval result
conn.enqueue([FakeRecord(id="1", key="name", value="Eugene")])  # fetch result
store = make_store_with_pool(conn)
```

## Coverage status

| Phase | Module | Tests | Status |
|-------|--------|-------|--------|
| 1 | `navi.core.events` | 17 | ✅ Done |
| 1 | `navi.core.compressor` | 14 | ✅ Done |
| 1 | `navi.core.registry` + `ContextProviderRegistry` | 13 | ✅ Done |
| 1 | `navi.core.context_builder` | 9 | ✅ Done |
| 1 | `navi.profiles.base` | 9 | ✅ Done |
| 2 | `navi.memory.store` | 18 | ✅ Done |
| 2 | `navi.memory.extractor` | 11 | ✅ Done |
| 3 | `navi.api.routes` | 19 | ✅ Done |
| 3 | `navi.api.routes.sessions` file endpoint logic | 5 | ✅ Basic |
| 3 | `navi.api.websocket` | 7 | ✅ Done |
| 3 | `navi.main` startup ordering | 1 | ✅ Basic |
| 4 | `navi.core.agent` | 9 | ✅ Done |
| 4 | `navi.core.planning` | 5 | ✅ Done |
| 5 | `navi.tools.filesystem` | 13 | ✅ Done |
| 5 | `navi.tools.code_exec` | 5 | ✅ Done |
| 5 | `navi.tools.terminal` | 4 | ✅ Done |
| 5 | `navi.tools.share_file` | 5 | ✅ Basic |
| 5 | `navi.tools.content_publish` | 4 | ✅ Basic |
| 5 | `navi.content_store` | 5 | ✅ Basic |
| 5 | `navi.llm.ollama` + fallback timeout wiring | 3 | ✅ Basic |
| 6 | `webclient/api` | 8 | ✅ Done |
| 6 | `webclient/stores/chat` | 23 | ✅ Done |
| 6 | `webclient/stores/sessions` | 6 | ✅ Done |
| 6 | `webclient/stores/profiles` | 3 | ✅ Done |
| 6 | `webclient/composables/useWebSocket` | 7 | ✅ Done |

## Coverage roadmap

This is the living plan for what still needs tests. Keep it updated whenever a
bug is fixed, a new module is added, or a planned area becomes covered.

Status meanings:
- ✅ Covered enough for current risk
- 🟨 Basic coverage exists, important edge cases remain
- ⬜ Not covered yet
- 🔴 Regression target from a real bug

### Phase 7 — Recent Regression Coverage

| Priority | Area | Target tests | Status |
|---|---|---|---|
| P0 | `navi.content_store.ensure_tables()` | creates `session_content`, creates `idx_session_content_file`, is idempotent when index already exists | ✅ |
| P0 | `navi.content_store.publish()` | repeated publish of same `(session_id, filename)` updates one row instead of creating duplicates | ✅ 🔴 |
| P0 | `navi.main` startup | registries are initialized before `_check_embed()` so memory has an embedding backend | ✅ 🔴 |
| P0 | `navi.core.registry._discover_backends()` | primary Ollama backend receives HTTP timeout >= `LLM_COMPLETE_TIMEOUT` and `LLM_STREAM_FIRST_CHUNK_TIMEOUT` | ✅ 🔴 |
| P0 | `navi.llm.fallback.FallbackOllamaBackend` | per-server `OllamaBackend` clients receive the same expanded timeout | ✅ 🔴 |
| P1 | `navi.tools.content_publish` | missing file, directory instead of file, successful publish metadata, filename path stripping | ✅ |
| P1 | `navi.tools.share_file` | duplicate filename collision produces numbered output without overwrite | ✅ |
| P1 | `navi.api.routes.sessions` file endpoints | upload duplicate naming, forbidden extension, download path traversal, content disposition | ✅ |

### Phase 8 — Agent Loop Behavior

| Priority | Area | Target tests | Status |
|---|---|---|---|
| P0 | `Agent.run_stream()` planning entry | first user message forces planning; later messages follow `planning_enabled` | ⬜ |
| P0 | plan → todo bridge | numbered plan steps auto-populate todo exactly once per plan | ⬜ |
| P0 | stop handling | stop during stream prefill yields `StreamStopped` and closes LLM generator | 🟨 |
| P1 | subagent forwarding | parent forwards subagent tool events and counts subagent tokens/tool calls | ⬜ |
| P1 | adaptive replan | newly failed todo step injects replan prompt on next iteration | ⬜ |
| P1 | anti-stall | repeated tool calls or no todo progress inject warning after threshold | ⬜ |
| P1 | workers | post-turn workers run after `StreamEnd`; worker failure is logged and non-fatal | ⬜ |

### Phase 9 — Memory And Embeddings

| Priority | Area | Target tests | Status |
|---|---|---|---|
| P0 | embedding backend wiring | `get_registries()` wires dedicated `EMBEDDING_OLLAMA_HOST` backend into `MemoryStore` | ⬜ |
| P0 | pgvector detection | `_has_pgvector()` true/false paths and caching behavior | ⬜ |
| P1 | embedding generation | invalid/empty/NaN vectors are skipped before PostgreSQL update | 🟨 |
| P1 | backfill | `backfill_embeddings()` batches rows and only updates rows with valid vectors | 🟨 |
| P1 | search | vector search falls back to ILIKE when vector search unavailable or empty | 🟨 |

### Phase 10 — WebSocket And API Lifecycles

| Priority | Area | Target tests | Status |
|---|---|---|---|
| P0 | active run guard | duplicate message while a run is active returns `run_already_active` | 🟨 |
| P0 | reconnect replay | reconnect receives missed events and session sync after finished run | 🟨 |
| P1 | stop endpoint | `POST /sessions/{id}/stop` sets stop event and is idempotent | 🟨 |
| P1 | malformed input | oversize images, invalid file refs, and non-string payloads are rejected or sanitized | ⬜ |
| P1 | startup cleanup | session file cleanup task is started once and deletes orphaned dirs | ⬜ |

### Phase 11 — Frontend

| Priority | Area | Target tests | Status |
|---|---|---|---|
| P0 | `ContentCard.vue` | renders download links and inline viewer links; handles encoded filenames | ⬜ |
| P0 | streaming chat | auto-scroll and streaming message updates stay reactive | 🟨 |
| P1 | session switching | concurrent session loads cannot overwrite active session with stale response | 🟨 |
| P1 | error surfaces | API/store failures show recoverable UI state, no unhandled rejection | 🟨 |
| P1 | file upload UI | upload success/failure, duplicate names, large-file errors | ⬜ |

## Running tests

```bash
# Backend tests
pytest                              # all backend tests
pytest tests/unit                   # unit only
pytest -v tests/unit/core           # verbose
pytest -v tests/unit/core/test_events.py::TestToolStarted::test_to_wire  # single test
TEST_DATABASE_URL=postgresql://... pytest tests/integration

# Web client tests (run from webclient/)
cd webclient && npm test             # all webclient tests
npx vitest run tests/unit/api       # single directory
npx vitest run -t "buildMessageList" # filter by test name
```

## Adding a new test

1. Create file in the appropriate `tests/unit/` or `tests/integration/` directory.
2. Use `async def` for async tests — `pytest-asyncio` handles the rest.
3. Import helpers from `tests.conftest_factory` for fakes.
4. Mutations to `navi.config.settings` are reset automatically by the autouse fixture in `conftest.py`.

## Guidelines

- **Mock at boundaries**: LLM calls → `FakeLLMBackend`, DB → `FakePool`, filesystem → `tmp_path`.
- **Avoid real network**: Never hit Ollama, OpenAI, or DuckDuckGo in unit tests.
- **Avoid real DB in unit tests**: Use in-memory mocks; real Postgres only in `tests/integration/`.
- **Keep tests deterministic**: No randomness, no time-dependent logic without monkeypatching `datetime.now`.
