Newer
Older
navi-1 / docs / tech_debt_review_2026-04-29.md
@Eugene Sukhodolskiy Eugene Sukhodolskiy on 29 Apr 16 KB Remove SQLite legacy support

Code Review — Technical Debt & Stability Issues

Date: 2026-04-29 Scope: Full backend + frontend review Total items: 53 issues

Security items are listed in a separate deferred section at the end of this document. They are not in scope for the current fix pass but must be addressed before any multi-user or public-facing deployment.


Critical — Stability / Data Loss / Crash

# Problem File Line Explanation
1 [FIXED] Race condition: concurrent runs overwrite each other navi/api/websocket.py 70, 328-331, 164 _runs[session_id] = run has no guard. Two tabs or rapid reconnects for the same session overwrite each other. The first run's finally block calls _runs.pop(), which accidentally removes the second run. Cooperative stop signals break; zombie tasks remain.
2 [FIXED] Resource leak: tool tasks never cancelled on generator teardown navi/core/agent.py 842-892 Each tool call runs in asyncio.create_task(). If the consumer (WebSocket handler) is cancelled, GeneratorExit breaks the while True loop at yield item but tool_task is never awaited or cancelled. It continues executing filesystem/SSH/shell commands in the background with no way to stop it.
3 [FIXED] Empty LLM stream silently kills fallback navi/llm/fallback.py 162-165, 208-210 stream() and stream_complete() catch StopAsyncIteration from an empty generator and do return, ending the entire fallback chain instead of trying the next server. User sees an empty response with no error.
4 [FIXED] Frontend race condition in session loading webclient/src/stores/chat.js 34-58 loadSession() guards against reloading the same ID, but has no guard against two different concurrent loads. If user switches A→B while loadSession(A) is in-flight, the late completion overwrites currentId back to A. UI snaps to the wrong session.
5 [FIXED] Frontend crash on non-string tool result webclient/src/components/messages/ContentCard.vue 100-111 metadata computed property calls .match() on props.tool.result ?? ''. If result is an object or null, .match() throws TypeError and crashes the component.
6 [FIXED] Frontend crash on malformed image data webclient/src/stores/chat.js 393 buildMessageList calls b.startsWith('data:') on every element of m.images. If backend sends null or a non-string, the history builder crashes.

High — Performance / Reliability / UX Breakage

# Problem File Line Explanation
7 [FIXED] Unbounded WebSocket replay buffer navi/api/websocket.py 44-45 _AgentRun.events is a plain list[dict] with no size limit. Large tool results (e.g. 5 MB terminal output) accumulate. Every reconnect replays the entire buffer, causing memory and bandwidth spikes.
8 [REMOVED] SQLite opens a new connection per operation navi/core/sqlite_session_store.py 68, 79, 89, 101, 111, 121, 129 SQLite support was removed entirely. PostgreSQL is now the only supported database. aiosqlite dependency removed, db_path config removed.
9 [FIXED] Memory extraction task storm navi/api/routes/sessions.py 50 Every POST /sessions spawns a fire-and-forget asyncio.create_task(_process_stale_sessions(...)). No deduplication or rate-limiting. Rapid session creation launches many concurrent LLM calls, overwhelming the backend.
10 [FIXED] Permanent blacklisting with no recovery navi/llm/fallback.py 38-39 _dead_servers and _dead_models are module-level sets. A transient network blip permanently blacklists the server until Python process restart.
11 [FIXED] Subagent exception isolation missing navi/core/agent.py 512 Inside run_ephemeral(), tools execute with bare await tool.execute(). No _run_with_sentinel wrapper. If a tool crashes, the subagent dies and all partial progress is lost.
12 [FIXED] No size validation on inline images navi/api/websocket.py 297-304 raw_images is taken directly from client JSON with no length or size checks. A client can send massive base64 payloads, causing memory exhaustion.
13 [FIXED] Orphaned files when DB insert fails navi/content_store.py 114-129 In publish(), shutil.copy2 runs first. If the subsequent DB INSERT fails (caught and logged), the copied file remains on disk with no metadata record, accumulating orphaned files in navi/content/.
14 [FIXED] Frontend: broken streaming auto-scroll webclient/src/components/chat/MessageList.vue 76-89 Watchers on chat.streamingMsg?.text never fire during streaming because streamingMsg is a shallowRef. Vue does not track deep mutations, so the chat list does not auto-scroll as new deltas arrive.
15 [FIXED] Frontend: mixed-content WebSocket on HTTPS webclient/src/composables/useWebSocket.js 4-6 WS URL is hardcoded to ws://${location.host}. When served over HTTPS, browsers block ws:// as mixed content. Must switch to wss:// when location.protocol === 'https:'.
16 [FIXED] Frontend: duplicate message sends from rapid clicks webclient/src/components/chat/InputBar.vue 141-160 handleSend has no debounce or in-flight guard. canSend only checks !chat.streaming, which is not set until stream_start. Rapid clicks can queue multiple identical messages.
17 [FIXED] Frontend: unhandled promise rejections across API layer webclient/src/api/index.js and callers request() and uploadFile() throw on errors, but almost no callers wrap their awaits in try/catch. Failures become unhandled rejections and leave the UI in a broken silent state.
18 SSH connections not closed on shutdown navi/tools/ssh_exec.py 54-90 _PoolEntry connections hang until TTL or process kill. No graceful cleanup on server shutdown.

Medium — Architectural / Edge Cases / Maintainability

# Problem File Line Explanation
19 N+1 UPDATE queries in embedding backfill navi/memory/store.py 170-176 backfill_embeddings() loops over rows, issuing one UPDATE per row. Should batch via executemany or multi-row UPDATE.
20 Unindexed ILIKE fallback search navi/memory/store.py 282-314 Text fallback constructs dynamic SQL with ILIKE on category, key, value. No indexes exist on these columns individually. Full table scan on large tables.
21 Context list rebuilt from scratch every iteration navi/core/agent.py ~683 _build_context() creates a brand-new list by filtering and prepending system messages on every tool-calling iteration. O(n) per iteration, generating significant garbage.
22 [FIXED] Unbounded planning_logs growth navi/core/agent.py 651 session.planning_logs.append(_ev.log) with no size limit or rotation. Long sessions with planning enabled produce huge JSON in DB, slowing save().
23 [FIXED] Background cleanup task unreferenced navi/main.py 72 asyncio.create_task(cleanup_loop(...)) — the returned Task is not stored. In Python < 3.11, unhandled exceptions may be silently discarded.
24 [FIXED] BaseException catches SystemExit navi/core/agent.py 847-848 _run_with_sentinel catches BaseException, which includes SystemExit. Should catch only Exception.
25 [FIXED] SystemExit swallowed in agent loop navi/core/agent.py 871 isinstance(r, BaseException) includes SystemExit. It should propagate to allow graceful server shutdown.
26 [FIXED] Hardcoded context output reserve navi/core/agent.py 1521 output_reserve = 2048 tokens is not configurable per model. May falsely reject valid contexts for small models.
27 Vector literal NaN/Infinity vulnerability navi/memory/store.py 73-75 _vector_to_str converts floats naively. A misbehaving embedding model returning NaN or Infinity creates invalid PostgreSQL vector syntax.
28 Tight coupling to todo module internals navi/core/agent.py 176-224 _todo_status_snapshot directly reads navi.tools.todo._plans, a module-level dict. Hard to test and refactor.
29 ContextVar values not reset in subagent path navi/core/agent.py 350-351 run_ephemeral() sets _sid_var and _model_var but never resets them. Background tasks spawned by the subagent may inherit stale values.
30 [FIXED] Replay/reattach may miss post-finish events navi/api/websocket.py 274-277 If a client reconnects after a run finishes but before disconnect is fully processed, it gets session_sync. No push event indicates new messages were persisted while disconnected.
31 [FIXED] Frontend: singleton confirm-dialog promise leak webclient/src/composables/useConfirm.js 5-30 Module-level _resolve. If confirm() is called twice before the first resolves, the second overwrites _resolve, causing the first promise to hang forever.
32 [FIXED] Frontend: body overflow leak on lightbox unmount webclient/src/components/ui/ImageLightbox.vue 26-28 watch(src) sets document.body.style.overflow = 'hidden'. If component unmounts while open, the side effect is never reverted. Page stays permanently unscrollable.
33 [FIXED] Frontend: DOM leak on fallback copy failure webclient/src/composables/useCopy.js 10-17 Fallback copy appends a <textarea> to body but lacks try/finally. If document.execCommand('copy') throws, the textarea remains in DOM.
34 [FIXED] Frontend: reconnect race in WebSocket send() webclient/src/composables/useWebSocket.js 52-64 Multiple send() calls while ws.readyState === CLOSING each trigger _connect(), creating redundant sockets that are immediately destroyed.

Low — Code Smell / Minor

# Problem File Line Explanation
35 No tests Entire project Neither backend nor frontend has unit or integration tests. Refactoring is risky.
36 [FIXED] Import-time side effects in deps navi/api/deps.py 37, 88, 89 _memory_store, _session_store, _workers instantiated at import time. SQLite table creation runs synchronously. Blocks health checks on lock failure.
37 Hardcoded old_webclient static mount navi/main.py 36-38 /static/ mounted from old_webclient. If renamed/removed, middleware fails.
38 [FIXED] __import__ anti-patterns navi/content_store.py:188, navi/main.py:16-19 188, 16-19 Inline __import__ instead of direct imports. Hard to read and static analysis misses them.
39 [FIXED] Token count 0 coerced to None navi/llm/ollama.py 137-138 or None converts legitimate 0 token count to None. Metrics lose precision.
40 TODO: semantic deduplication navi/memory/extractor.py 170 Memory facts accumulate duplicates. Vector search + similarity threshold needed before upsert.
41 Frontend: global marked singleton mutation webclient/src/composables/useMarkdown.js 5-16 marked.setOptions() and marked.use() mutate the global singleton. Conflicts with any other module wanting different options.
42 Frontend: profile selector does not change active session profile webclient/src/components/ui/AppSidebar.vue 16 @change updates profilesStore.selectedProfileId directly but does not change the active session's profile. User thinks they switched but they didn't.
43 [FIXED] Frontend: stale streamingMsg after reconnect reload webclient/src/stores/chat.js 74-86 reloadSession rebuilds messages.value but does not clear streamingMsg.value. Subsequent deltas mutate an orphaned object not in the rendered array.
44 [FIXED] Frontend: incomplete HTML escaping in user messages webclient/src/components/messages/UserMessage.vue 48-54 escapeHtml encodes & < > " but not '. Defense-in-depth gap if ever injected into single-quoted attributes.
45 Frontend: no message virtualization webclient/src/components/chat/MessageList.vue 7-13 All messages rendered in DOM. Long sessions degrade scroll/render performance. vue-virtual-scroller is already in the project but unused here.
46 Frontend: fragile content-type detection from URL webclient/src/components/messages/ContentCard.vue 120 url.split('.').pop() does not strip query strings, so image.png?size=large fails lookup.
47 Frontend: unused dependency webclient/package.json 12 @tanstack/vue-virtual listed but never imported; vue-virtual-scroller is used instead.
48 [FIXED] Frontend: findLast compatibility risk webclient/src/stores/chat.js 186-189, 257 Array.prototype.findLast requires ES2023. Fails on older browsers unless polyfilled.
49 [FIXED] Frontend: fragile table-wrapper regex webclient/src/composables/useMarkdown.js 121 html.replace(/<table>/g, ...) does not match <table class="foo">, so those tables miss the scrollable wrapper.
50 [FIXED] Frontend: inefficient copy-button re-attachment webclient/src/components/messages/AssistantMessage.vue 126 watch(renderedText) calls attachCopyButtons on every streaming delta, repeatedly querying DOM even though buttons haven't changed.
51 Frontend: imprecise relative-time updates webclient/src/composables/useTime.js 36 30-second interval means "just now" can be stale for up to 30s before jumping.
52 Frontend: stream message ID collision risk webclient/src/stores/chat.js 148 id: 'stream_${Date.now()}' could collide within the same millisecond on rapid reconnect.
53 [FIXED] Frontend: flawed meta-row visibility logic webclient/src/components/messages/AssistantMessage.vue 102-104 hasMeta uses ` props.msg.tool_call_count, so0` is falsy and may incorrectly hide the meta row.

Security (Deferred — address before multi-user/public deployment)

# Problem File Line Explanation
S1 Arbitrary code execution (code_exec) navi/tools/code_exec.py 54 asyncio.create_subprocess_exec(sys.executable, script_path) with no sandbox (no seccomp, chroot, separate user). LLM-generated code runs with full server privileges.
S2 Full shell access by default (terminal) navi/tools/terminal.py 66; navi/config.py TERMINAL_ALLOWED_COMMANDS=* default. create_subprocess_shell allows any command including rm -rf, `curl \ sh`.
S3 Self-modifying code (write_tool) navi/tools/write_tool.py 84 LLM writes arbitrary .py to tools/ and immediately imports it. Prompt injection → persistent backdoor without server restart.
S4 Path traversal in content_publish navi/content_store.py 97-99 dest = dest_dir / dest_name where dest_name is unsanitized user input. ../../etc/passwd escapes the content directory.
S5 XSS via v-html in markdown rendering webclient/src/composables/useMarkdown.js 119; AssistantMessage.vue renderMarkdown returns raw HTML from marked.parse(). v-html in AssistantMessage, SummaryCard, CompressionNotice, ToolCard renders it unsanitized.
S6 Iframe sandbox bypass webclient/src/components/messages/ContentCard.vue 26-41 sandbox="allow-scripts allow-same-origin" on same-origin iframe removes all restrictions. Framed content can access localStorage, cookies, window.parent.
S7 Symlink traversal in filesystem navi/tools/filesystem.py 71-88 Path.resolve() follows symlinks. If FS_ALLOWED_PATHS=/home/user and a symlink points to /etc, read accesses /etc/shadow.
S8 SSH MITM by default navi/tools/ssh_exec.py 254, 271 known_hosts="none" disables host key verification. Vulnerable to man-in-the-middle attacks.
S9 Weak content UUID (8 chars) navi/content_store.py 93 str(uuid.uuid4())[:8] — only 4 billion combinations. /content is open without auth. Brute-force guessable.
S10 Passwords in plaintext in SSH pool navi/tools/ssh_exec.py 48-51 _PoolEntry.connect_kwargs stores passwords/keys. No scrubbing in logs or memory.
S11 No image size validation on WS navi/api/websocket.py 297-304 raw_images taken directly from client JSON without length/size checks. Malicious client can OOM the server.
S12 Indirect recursion in spawn_agent navi/tools/spawn_agent.py 128 exclude_tools=["spawn_agent"] blocks direct call, but sub-agent can call write_tool → new tool → spawn_agent.
S13 No working_dir validation in terminal navi/tools/terminal.py 70 working_dir passed directly to cwd. Any directory accessible if unrestricted mode.