diff --git a/docs/testing.md b/docs/testing.md index f8a2091..97181cd 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -22,17 +22,20 @@ │ │ ├── test_events.py # 17 tests │ │ ├── test_context_builder.py │ │ ├── test_compressor.py -│ │ ├── test_registry.py +│ │ ├── test_registry.py # registries + context provider registry │ │ ├── test_planning.py │ │ ├── test_agent_context_size.py │ │ └── test_agent_stream_guard.py +│ ├── llm/ +│ │ └── test_ollama.py # timeout/error classification │ ├── memory/ │ │ ├── test_store.py │ │ └── test_extractor.py │ ├── tools/ │ │ ├── test_filesystem.py │ │ ├── test_code_exec.py -│ │ └── test_terminal.py +│ │ ├── test_terminal.py +│ │ └── test_share_file.py │ ├── profiles/ │ │ └── test_base.py │ └── config/ @@ -90,7 +93,7 @@ |-------|--------|-------|--------| | 1 | `navi.core.events` | 17 | ✅ Done | | 1 | `navi.core.compressor` | 14 | ✅ Done | -| 1 | `navi.core.registry` | 10 | ✅ Done | +| 1 | `navi.core.registry` + `ContextProviderRegistry` | 12 | ✅ Done | | 1 | `navi.core.context_builder` | 9 | ✅ Done | | 1 | `navi.profiles.base` | 9 | ✅ Done | | 2 | `navi.memory.store` | 18 | ✅ Done | @@ -102,12 +105,82 @@ | 5 | `navi.tools.filesystem` | 13 | ✅ Done | | 5 | `navi.tools.code_exec` | 5 | ✅ Done | | 5 | `navi.tools.terminal` | 4 | ✅ Done | +| 5 | `navi.tools.share_file` | 4 | ✅ Basic | +| 5 | `navi.tools.content_publish` | 0 | ⬜ Planned | +| 5 | `navi.content_store` | 0 | ⬜ Planned | +| 5 | `navi.llm.ollama` | 2 | ✅ Basic | | 6 | `webclient/api` | 8 | ✅ Done | | 6 | `webclient/stores/chat` | 23 | ✅ Done | | 6 | `webclient/stores/sessions` | 6 | ✅ Done | | 6 | `webclient/stores/profiles` | 3 | ✅ Done | | 6 | `webclient/composables/useWebSocket` | 7 | ✅ Done | +## Coverage roadmap + +This is the living plan for what still needs tests. Keep it updated whenever a +bug is fixed, a new module is added, or a planned area becomes covered. + +Status meanings: +- ✅ Covered enough for current risk +- 🟨 Basic coverage exists, important edge cases remain +- ⬜ Not covered yet +- 🔴 Regression target from a real bug + +### Phase 7 — Recent Regression Coverage + +| Priority | Area | Target tests | Status | +|---|---|---|---| +| P0 | `navi.content_store.ensure_tables()` | creates `session_content`, creates `idx_session_content_file`, is idempotent when index already exists | ⬜ | +| P0 | `navi.content_store.publish()` | repeated publish of same `(session_id, filename)` updates one row instead of creating duplicates | ⬜ 🔴 | +| P0 | `navi.main` startup | registries are initialized before `_check_embed()` so memory has an embedding backend | ⬜ 🔴 | +| P0 | `navi.core.registry._discover_backends()` | primary Ollama backend receives HTTP timeout >= `LLM_COMPLETE_TIMEOUT` and `LLM_STREAM_FIRST_CHUNK_TIMEOUT` | ⬜ 🔴 | +| P0 | `navi.llm.fallback.FallbackOllamaBackend` | per-server `OllamaBackend` clients receive the same expanded timeout | ⬜ 🔴 | +| P1 | `navi.tools.content_publish` | missing file, directory instead of file, successful publish metadata, filename path stripping | ⬜ | +| P1 | `navi.tools.share_file` | duplicate filename collision produces numbered output without overwrite | ⬜ | +| P1 | `navi.api.routes.sessions` file endpoints | upload duplicate naming, forbidden extension, download path traversal, content disposition | 🟨 | + +### Phase 8 — Agent Loop Behavior + +| Priority | Area | Target tests | Status | +|---|---|---|---| +| P0 | `Agent.run_stream()` planning entry | first user message forces planning; later messages follow `planning_enabled` | ⬜ | +| P0 | plan → todo bridge | numbered plan steps auto-populate todo exactly once per plan | ⬜ | +| P0 | stop handling | stop during stream prefill yields `StreamStopped` and closes LLM generator | 🟨 | +| P1 | subagent forwarding | parent forwards subagent tool events and counts subagent tokens/tool calls | ⬜ | +| P1 | adaptive replan | newly failed todo step injects replan prompt on next iteration | ⬜ | +| P1 | anti-stall | repeated tool calls or no todo progress inject warning after threshold | ⬜ | +| P1 | workers | post-turn workers run after `StreamEnd`; worker failure is logged and non-fatal | ⬜ | + +### Phase 9 — Memory And Embeddings + +| Priority | Area | Target tests | Status | +|---|---|---|---| +| P0 | embedding backend wiring | `get_registries()` wires dedicated `EMBEDDING_OLLAMA_HOST` backend into `MemoryStore` | ⬜ | +| P0 | pgvector detection | `_has_pgvector()` true/false paths and caching behavior | ⬜ | +| P1 | embedding generation | invalid/empty/NaN vectors are skipped before PostgreSQL update | 🟨 | +| P1 | backfill | `backfill_embeddings()` batches rows and only updates rows with valid vectors | 🟨 | +| P1 | search | vector search falls back to ILIKE when vector search unavailable or empty | 🟨 | + +### Phase 10 — WebSocket And API Lifecycles + +| Priority | Area | Target tests | Status | +|---|---|---|---| +| P0 | active run guard | duplicate message while a run is active returns `run_already_active` | 🟨 | +| P0 | reconnect replay | reconnect receives missed events and session sync after finished run | 🟨 | +| P1 | stop endpoint | `POST /sessions/{id}/stop` sets stop event and is idempotent | 🟨 | +| P1 | malformed input | oversize images, invalid file refs, and non-string payloads are rejected or sanitized | ⬜ | +| P1 | startup cleanup | session file cleanup task is started once and deletes orphaned dirs | ⬜ | + +### Phase 11 — Frontend + +| Priority | Area | Target tests | Status | +|---|---|---|---| +| P0 | `ContentCard.vue` | renders download links and inline viewer links; handles encoded filenames | ⬜ | +| P0 | streaming chat | auto-scroll and streaming message updates stay reactive | 🟨 | +| P1 | session switching | concurrent session loads cannot overwrite active session with stale response | 🟨 | +| P1 | error surfaces | API/store failures show recoverable UI state, no unhandled rejection | 🟨 | +| P1 | file upload UI | upload success/failure, duplicate names, large-file errors | ⬜ | + ## Running tests ```bash