| 2026-05-25 |
Phase 5: Cleanup legacy JSON dual-write, remove duplicated class
...
- Removed JSON column writes from create() and save()
- Removed duplicated PgSessionStore class block that leaked in during Phase 4
- _serialize/_deserialize retained for boot migration helper only
- All session I/O now goes exclusively through session_messages table
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 May
|
Phase 4: Read list/search from session_messages, drop JSON fallback
...
- list_all / list_page / search_list now load messages via _build_sessions()
which batch-loads session_messages in a single query
- count_all search uses EXISTS (SELECT 1 FROM session_messages ...) instead
of ILIKE on the JSON text column
- Removed _row_to_session and all legacy JSON column reads from list paths
- get() continues to load session_messages directly (Phase 3)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 May
|
Phase 3: Build context from session_messages table
...
- get() always reads from session_messages, never falls back to JSON columns
- Load all rows in a single query, then split by is_display / is_context flags
- messages[] and context[] now share the same Python objects,
enabling reliable id() matching in the agent compression path
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 May
|
Phase 2: Dual-write with is_context/is_display flags on Message
...
- Message model gets is_context and is_display bools
- PgSessionStore.save() writes flags directly to session_messages
- Agent sets is_context=False on display-only messages, is_display=False on context-only
- Planning: plan context msg is_display=False, plan marker is_context=False
- Compression: summarized messages get is_context=False, summary added to messages with is_display=False
- Tests updated for extra user display+context messages per turn
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 May
|
Fix user tool execute() signature to accept ctx keyword argument
...
The tool_executor always passes ctx=ctx when calling tool.execute(),
but _try_module_level in loader.py created user tools with
_execute(self, params) only. This caused:
Error: _execute() got an unexpected keyword argument 'ctx'
when calling get_current_datetime and any other module-level user tool.
- Add ctx=None parameter to the generated _execute wrapper
- Preserves backward compatibility with existing user tools
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 May
|
Add subagent progress report on failure
...
When a subagent stops (timeout, max iterations, thinking stall, user stop),
it now returns a structured progress report built from its local message
context, so the parent agent knows what tools were called and what was
accomplished before the stop.
- Add _build_progress_report() to SubAgentRunner
- Report includes: turn number, assistant text, tool calls with results
- Prepended to result_text for every stop reason (completed also gets it)
- Updated test_run_ephemeral_complete to expect the report prefix
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 May
|
Add client-side image resize and server-side image token budgeting
...
Client (useFileUpload.js):
- Resize images to max 1024px on longest side via Canvas before upload
- JPEG quality 0.9, preserves EXIF orientation
- Reduces typical phone photos from 8 MB → ~400-900 KB
Server (websocket.py):
- Increase limit to 8 images, 50 MB total payload
Server (agent.py):
- Before adding user message to session.context, estimate how many images
fit in remaining context (500 tok/img, up to 80% of max_context_tokens)
- Images that don't fit are saved as resized JPEGs to session_files/
- References to saved images are appended to the user message text
so Navi can view them later via image_view tool
- session.messages keeps ALL images for full UI history
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 May
|
| 2026-05-24 |
Fix MCP tool spinner bug: match tool_started → tool_call by tool_call_id
...
- Add tool_call_id field to ToolStarted and ToolEvent dataclasses
- Pass tc.id as tool_call_id from agent.py, subagent_runner.py, and tool_executor.py
- Update frontend chat.js onToolStarted/onToolCall to match cards by toolCallId
with fallback to name-matching for backward compatibility
Closes spinner issue where LLM short name ("search_docs") didn't match
resolved MCP name ("mcp__gnexus_book__search_docs").
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Fix MCP tool registration at startup
...
Root cause: _on_mcp_server_connected was defined and registered AFTER
mcp_manager.load_all(), so startup connections never registered their
tools into the ToolRegistry.
Fix: explicitly iterate connected clients after build_default_registries()
creates tool_registry, and register their MCP tools before starting the
health-check loop. The callback stays for future reconnects.
Also adds diagnostic warning in build_tool_list when tools are missing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Add debug logging to trace MCP tool registration / resolution
...
Logs added to:
- _on_mcp_server_connected: shows how many tools are registered per server
- build_tool_list: lists missing tool names and counts
- _resolve_tool: dumps tool_map keys when a lookup fails
This is diagnostic — will be removed once the root cause is found.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Fix MCP tool availability after health-check disconnect/reconnect
...
- connect() now calls _cleanup() first to avoid stale transport leaks.
- mark_disconnected() clears _session so _ensure_connected() knows
the client truly needs a fresh session.
- register_mcp_tools() is no longer a no-op; it re-registers all
connected MCP tools so reload_tools correctly restores them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Fix MCP health-check spamming "connected" toasts every 30s
...
Only publish McpStatusUpdate "connected" when a server transitions
from disconnected -> connected, not on every routine poll.
Track last-known state in _connected_status dict.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Suppress noisy MCP SDK health-check logs
...
mcp.server.lowlevel.server logs "Processing request of type ListToolsRequest"
on every health-check poll. Raise its threshold to WARNING so we only
see real errors.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Add auth resilience: user cache, retry, and API token fallback
...
- 30-second in-memory _user_cache to avoid hammering gnexus-auth
- _fetch_user_with_retry: one retry after 1.5s sleep on transient failure
- API token fallback when OAuth cookie is present but refresh fails
- Clear cache/locks in test fixture to prevent cross-test pollution
- Fix registry timeout test after lowering default to 90s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Raise first-chunk timeout to 90s and retry same server+model before fallback
...
- config.py: llm_stream_first_chunk_timeout 180s → 90s
- fallback.py stream_complete: wrap gen.__anext__() in asyncio.wait_for()
with llm_stream_first_chunk_timeout; on TimeoutError or LLMConnectionError
sleep 2s and retry once on the same server+model before blacklisting/fallback
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Add MCP health check loop, auto-reconnect, and system toast notifications
...
Backend:
- McpManager: keep all configured servers in the pool even if connect()
fails at startup; health-check loop (30s) tries to reconnect dead servers
and verifies live ones with list_tools()
- McpManager: set_on_server_connected callback re-registers tools when a
dead server comes back online
- McpClient: add mark_disconnected() for silent drop detection
- McpStatusTool: skip list_tools() for disconnected servers
- Orchestrator: broadcast mcp_status_update to all WebSocket sessions
- New event type McpStatusUpdate with server_name, status, tool_count
Webclient:
- useWebSocket: handle mcp_status_update → toast.success/toast.error
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Fix MCP transport teardown race with anyio task groups
Eugene Sukhodolskiy
committed
on 24 May
|
Apply review fixes to API token auth system
...
Backend:
- navi/auth/deps.py: replace 3 DB round-trips with single JOIN query for
token resolution; update last_used_at still separate (best-effort)
- navi/api/routes/api_tokens.py: replace asyncpg-specific "UPDATE 1"
string check with RETURNING id fetchrow; increase token_prefix from
8 to 12 chars for better visual identification; add security notes
- tests/unit/auth/test_api_tokens.py: update tests for JOIN query and
RETURNING-based revoke
Frontend:
- webclient/src/components/settings/ShowTokenModal.vue: new modal that
shows the plain token in a readonly field with copy button and
explicit warning — replaces the transient toast notification
- webclient/src/components/settings/ApiKeysPanel.vue: use ShowTokenModal
- webclient/src/composables/useWebSocket.js: add security comment about
localStorage XSS risk and query param log exposure
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|

Add API token auth system for headless/micro clients
...
Backend:
- navi/auth/_ddl.py: add api_tokens table with boot-time migration
- navi/auth/deps.py: _resolve_user now falls back to X-Api-Token header
and ?api_token query param for WebSocket auth
- navi/auth/__init__.py: add ApiToken pydantic model
- navi/api/routes/api_tokens.py: CRUD endpoints (POST/GET/DELETE)
- navi/main.py: wire api_tokens router
Frontend:
- webclient/src/App.vue: add #settings hash routing
- webclient/src/components/settings/: SettingsView, ApiKeysPanel,
CreateKeyModal with copy-to-clipboard flow
- webclient/src/api/index.js: token CRUD API functions
- webclient/src/stores/apiTokens.js: Pinia store
- webclient/src/components/sidebar/AppSidebar.vue: settings link
- webclient/src/composables/useWebSocket.js: append ?api_token= when
localStorage token is present
Tests:
- tests/unit/auth/test_api_tokens.py: 10 unit tests covering token
resolution (header + query param), revoke, missing/revoked tokens,
orphan users, and CRUD endpoints
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Enable gnexus-creds MCP in profiles that already use gnexus-book
...
- discuss: add gnexus-creds read group to agent tools
- server_admin: add gnexus-creds read+write groups to agent and subagent tools
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Add MCP streamable_http transport, integrate gnexus-creds, and document headless nodes
...
- navi/mcp/client.py: add streamable_http transport via httpx + mcp.client.streamable_http
- navi/mcp/config.py: add "streamable_http" to transport literal and is_streamable_http property
- mcp_servers.d/gnexus-creds.json: new MCP server config with overlay instructions for secret workflow
- docs/future_headless_nodes.md: architecture exploration for headless Navi node swarm
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
| 2026-05-23 |
Fix recall race, ContextVar leaks, dead code, and recall duplication
...
- run_recall: wrap busy check + create_run in session_lock to prevent
race between scheduler and websocket handler
- run_recall: save ContextVar tokens and reset in finally to avoid
leaking user context into subsequent background tasks
- websocket.py: reset user ContextVars in finally after run completes
- orchestrator.py: remove dead set_notify / _notify abstraction
- orchestrator.py: extract _finalize_recall to deduplicate success /
MaxIterationsReached / Exception finalization blocks
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 23 May
|
Unify in-memory session state in AgentSessionOrchestrator
...
Replace scattered _runs + _busy_sessions + _session_sockets with a
single _sessions: dict[str, SessionState] on the orchestrator.
- SessionState dataclass holds run, busy_event, and websockets
- _session_sockets module-level global removed from websocket.py;
socket tracking moved into orchestrator (add/remove_websocket)
- Event bus subscriber _on_recall_update moved into orchestrator
- Per-session asyncio.Lock added to protect concurrent-run guard
- _cleanup() auto-removes empty SessionState entries
Tests updated to reference _sessions instead of legacy _runs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 23 May
|
Pass explicit ToolContext to tools instead of hidden ContextVars
...
Add ToolContext dataclass (session_id, event_sink, stop_event, model,
user_id, user_role, user_info) and thread it through the execution chain:
Agent._execute_tools_with_sink → ToolExecutor → tool.execute().
All ~25 tools updated to accept ctx parameter. Tools that previously
read ContextVar now prefer ctx when provided, falling back to
ContextVar for backward compatibility.
Tests updated to pass ToolContext explicitly — no more test fixtures
that set current_session_id / current_user_id ContextVars.
ContextVar setters remain as fallback for non-tool consumers
(ai_helper, context_builder, planning) and will be removed in a
follow-up refactor.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 23 May
|
Fix auth race condition causing frequent logouts
...
Add per-session-id asyncio.Lock around token refresh to prevent
parallel requests from simultaneously refreshing the same token.
Re-read the session inside the lock so a second request can use
the token already refreshed by the first one.
Stop deleting the auth session on refresh failure — transient
errors (network, race condition, expired refresh token) were
wiping the session and forcing a full re-login.
+ tests for both behaviours.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 23 May
|
| 2026-05-21 |
Prioritise gemma4 and add MCP tools to subagent scopes
...
- modeler_3d: gemma4:31b-cloud first (vision-capable), drop glm-5.1
- modeler_3d subagent: add navi-3d + navi-web MCP tools
- Adjust subagent tool scopes across profiles for explicit MCP access
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
Add structured logging for Ollama chat errors
...
Log model, message count, tools count, and raw error string whenever
self._client.chat() raises an exception. This makes it possible to
reconstruct the exact request payload that triggered a 500 from
Ollama Cloud — critical for diagnosing transient vs systemic failures.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|

Refactor profile tool config to explicit agent/subagent structure
...
Replaces the confusing mix of enabled_tools + mcp_servers + subagent_tools
with a single explicit structure:
tools: {
agent: {native: [...], mcp: {server: [groups]}},
subagent:{native: [...], mcp: {server: [groups]}}
}
Why:
- Old fields mixed native and MCP names (mcp__server__tool) in one list,
making it impossible to tell at a glance what a subagent actually gets.
- subagent_runner.py had 25 lines of runtime MCP filtering logic that
was hard to follow and error-prone.
Changes:
- AgentProfile: add ToolConfig / ToolScopeConfig pydantic models,
keep old fields (enabled_tools, mcp_servers, subagent_tools) for
auto-migration via _migrate_tools validator.
- loader.py: read new "tools" key, auto-migrate legacy configs.
- agent.py: _tool_list now accepts ToolScopeConfig.
- subagent_runner.py: simplified — profile.get_subagent_tools() returns
the exact scope, no runtime filtering needed.
- context_builder.py, list_tools.py, spawn_agent.py: updated to use
profile.get_agent_tools() / get_subagent_tools().
- All 6 profile config.json files migrated to new schema.
- Secretary subagent now explicitly gets navi-web MCP tools for web search.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
Fix planning: change plan-follow-up role from system to user
...
After injecting the plan as an assistant message into session.context,
the previous code appended a system message saying "Plan is ready.
Execute it now..." Many instruct-tuned models treat their own
assistant message as a completed response, and a trailing system
instruction is easy to ignore.
Changing the follow-up to role="user" makes the model see:
assistant: plan
user: "Execute this plan..."
which obligates the model to produce a new assistant response —
the tool-calling execution phase.
The follow-up message is appended only to session.context (LLM
context) and never to session.messages, so it is invisible in the
chat UI.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
Fix session file download URL — reverse legacy redirect
...
The webclient requests files at /api/sessions/{id}/files/{name},
but the actual endpoint lives at /sessions/{id}/files/{name}.
The old legacy redirect pointed the wrong way (/sessions → /api),
which always 404'd because /api/sessions/... was never registered.
- Replace legacy redirect with /api/sessions/... → /sessions/... (307)
- 307 preserves the request method and cookies during the redirect
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|