| 2026-05-24 |
Fix MCP transport teardown race with anyio task groups
Eugene Sukhodolskiy
committed
on 24 May
|
Fix settings route switching and document API token system
...
- App.vue: make route reactive via ref + getRouteFromHash() so
#settings toggles work without page reload
- docs/api_tokens.md: new comprehensive API token auth doc
- docs/api.md: add /api-tokens REST endpoints
- docs/auth.md: add token flow architecture diagram and resolution order
- docs/websocket.md: add auth section with cookie vs query-param token
- docs/architecture.md: update to AgentSessionOrchestrator + ToolContext
- docs/tools.md: add gnexus-creds MCP tools
- docs/index.md: link to api_tokens.md
- Rebuild webclient dist
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Apply review fixes to API token auth system
...
Backend:
- navi/auth/deps.py: replace 3 DB round-trips with single JOIN query for
token resolution; update last_used_at still separate (best-effort)
- navi/api/routes/api_tokens.py: replace asyncpg-specific "UPDATE 1"
string check with RETURNING id fetchrow; increase token_prefix from
8 to 12 chars for better visual identification; add security notes
- tests/unit/auth/test_api_tokens.py: update tests for JOIN query and
RETURNING-based revoke
Frontend:
- webclient/src/components/settings/ShowTokenModal.vue: new modal that
shows the plain token in a readonly field with copy button and
explicit warning — replaces the transient toast notification
- webclient/src/components/settings/ApiKeysPanel.vue: use ShowTokenModal
- webclient/src/composables/useWebSocket.js: add security comment about
localStorage XSS risk and query param log exposure
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|

Add API token auth system for headless/micro clients
...
Backend:
- navi/auth/_ddl.py: add api_tokens table with boot-time migration
- navi/auth/deps.py: _resolve_user now falls back to X-Api-Token header
and ?api_token query param for WebSocket auth
- navi/auth/__init__.py: add ApiToken pydantic model
- navi/api/routes/api_tokens.py: CRUD endpoints (POST/GET/DELETE)
- navi/main.py: wire api_tokens router
Frontend:
- webclient/src/App.vue: add #settings hash routing
- webclient/src/components/settings/: SettingsView, ApiKeysPanel,
CreateKeyModal with copy-to-clipboard flow
- webclient/src/api/index.js: token CRUD API functions
- webclient/src/stores/apiTokens.js: Pinia store
- webclient/src/components/sidebar/AppSidebar.vue: settings link
- webclient/src/composables/useWebSocket.js: append ?api_token= when
localStorage token is present
Tests:
- tests/unit/auth/test_api_tokens.py: 10 unit tests covering token
resolution (header + query param), revoke, missing/revoked tokens,
orphan users, and CRUD endpoints
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Enable gnexus-creds MCP in profiles that already use gnexus-book
...
- discuss: add gnexus-creds read group to agent tools
- server_admin: add gnexus-creds read+write groups to agent and subagent tools
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
Add MCP streamable_http transport, integrate gnexus-creds, and document headless nodes
...
- navi/mcp/client.py: add streamable_http transport via httpx + mcp.client.streamable_http
- navi/mcp/config.py: add "streamable_http" to transport literal and is_streamable_http property
- mcp_servers.d/gnexus-creds.json: new MCP server config with overlay instructions for secret workflow
- docs/future_headless_nodes.md: architecture exploration for headless Navi node swarm
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
| 2026-05-23 |
Fix recall race, ContextVar leaks, dead code, and recall duplication
...
- run_recall: wrap busy check + create_run in session_lock to prevent
race between scheduler and websocket handler
- run_recall: save ContextVar tokens and reset in finally to avoid
leaking user context into subsequent background tasks
- websocket.py: reset user ContextVars in finally after run completes
- orchestrator.py: remove dead set_notify / _notify abstraction
- orchestrator.py: extract _finalize_recall to deduplicate success /
MaxIterationsReached / Exception finalization blocks
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 23 May
|
Unify in-memory session state in AgentSessionOrchestrator
...
Replace scattered _runs + _busy_sessions + _session_sockets with a
single _sessions: dict[str, SessionState] on the orchestrator.
- SessionState dataclass holds run, busy_event, and websockets
- _session_sockets module-level global removed from websocket.py;
socket tracking moved into orchestrator (add/remove_websocket)
- Event bus subscriber _on_recall_update moved into orchestrator
- Per-session asyncio.Lock added to protect concurrent-run guard
- _cleanup() auto-removes empty SessionState entries
Tests updated to reference _sessions instead of legacy _runs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 23 May
|
Pass explicit ToolContext to tools instead of hidden ContextVars
...
Add ToolContext dataclass (session_id, event_sink, stop_event, model,
user_id, user_role, user_info) and thread it through the execution chain:
Agent._execute_tools_with_sink → ToolExecutor → tool.execute().
All ~25 tools updated to accept ctx parameter. Tools that previously
read ContextVar now prefer ctx when provided, falling back to
ContextVar for backward compatibility.
Tests updated to pass ToolContext explicitly — no more test fixtures
that set current_session_id / current_user_id ContextVars.
ContextVar setters remain as fallback for non-tool consumers
(ai_helper, context_builder, planning) and will be removed in a
follow-up refactor.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 23 May
|
Fix auth race condition causing frequent logouts
...
Add per-session-id asyncio.Lock around token refresh to prevent
parallel requests from simultaneously refreshing the same token.
Re-read the session inside the lock so a second request can use
the token already refreshed by the first one.
Stop deleting the auth session on refresh failure — transient
errors (network, race condition, expired refresh token) were
wiping the session and forcing a full re-login.
+ tests for both behaviours.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 23 May
|
| 2026-05-22 |
Fix link deduplication in ArtifactsPanel
...
Normalize URLs by pathname (lowercased, no query/hash, no trailing
slash) so the same file reached via different query params or
absolute/relative forms is counted once.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 22 May
|
| 2026-05-21 |
Prioritise gemma4 and add MCP tools to subagent scopes
...
- modeler_3d: gemma4:31b-cloud first (vision-capable), drop glm-5.1
- modeler_3d subagent: add navi-3d + navi-web MCP tools
- Adjust subagent tool scopes across profiles for explicit MCP access
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
Add structured logging for Ollama chat errors
...
Log model, message count, tools count, and raw error string whenever
self._client.chat() raises an exception. This makes it possible to
reconstruct the exact request payload that triggered a 500 from
Ollama Cloud — critical for diagnosing transient vs systemic failures.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|

Refactor profile tool config to explicit agent/subagent structure
...
Replaces the confusing mix of enabled_tools + mcp_servers + subagent_tools
with a single explicit structure:
tools: {
agent: {native: [...], mcp: {server: [groups]}},
subagent:{native: [...], mcp: {server: [groups]}}
}
Why:
- Old fields mixed native and MCP names (mcp__server__tool) in one list,
making it impossible to tell at a glance what a subagent actually gets.
- subagent_runner.py had 25 lines of runtime MCP filtering logic that
was hard to follow and error-prone.
Changes:
- AgentProfile: add ToolConfig / ToolScopeConfig pydantic models,
keep old fields (enabled_tools, mcp_servers, subagent_tools) for
auto-migration via _migrate_tools validator.
- loader.py: read new "tools" key, auto-migrate legacy configs.
- agent.py: _tool_list now accepts ToolScopeConfig.
- subagent_runner.py: simplified — profile.get_subagent_tools() returns
the exact scope, no runtime filtering needed.
- context_builder.py, list_tools.py, spawn_agent.py: updated to use
profile.get_agent_tools() / get_subagent_tools().
- All 6 profile config.json files migrated to new schema.
- Secretary subagent now explicitly gets navi-web MCP tools for web search.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
Fix planning: change plan-follow-up role from system to user
...
After injecting the plan as an assistant message into session.context,
the previous code appended a system message saying "Plan is ready.
Execute it now..." Many instruct-tuned models treat their own
assistant message as a completed response, and a trailing system
instruction is easy to ignore.
Changing the follow-up to role="user" makes the model see:
assistant: plan
user: "Execute this plan..."
which obligates the model to produce a new assistant response —
the tool-calling execution phase.
The follow-up message is appended only to session.context (LLM
context) and never to session.messages, so it is invisible in the
chat UI.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
Fix session file download URL — reverse legacy redirect
...
The webclient requests files at /api/sessions/{id}/files/{name},
but the actual endpoint lives at /sessions/{id}/files/{name}.
The old legacy redirect pointed the wrong way (/sessions → /api),
which always 404'd because /api/sessions/... was never registered.
- Replace legacy redirect with /api/sessions/... → /sessions/... (307)
- 307 preserves the request method and cookies during the redirect
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
Fix stop button responsiveness and shutdown CancelledError
...
Agent loop (_execute_tools_with_sink):
- Poll stop_event every 1s while draining the event sink via asyncio.wait_for
- When stopped, cancel the tool task, yield a synthetic ToolEvent failure,
append a cancellation message to session, yield StreamStopped, and return
- Pass stop_event into _execute_tools_with_sink call site
Subagent runner:
- Check stop_event at the start of each tool in turn_tool_calls loop
- Returns early with ("", False) when stopped mid-batch
McpManager.disconnect_all():
- Disconnect clients sequentially instead of asyncio.gather
- Handle asyncio.CancelledError per-client to avoid shutdown traceback
AppContainer.shutdown():
- Catch BaseException instead of Exception for MCP and DB cleanup
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
Fix token counting: show only completion tokens, not cumulative prompt+completion
...
The token_count displayed next to assistant messages was summing
prompt_tokens + completion_tokens across ALL tool-calling iterations,
giving hundreds of thousands of tokens for multi-turn conversations.
Now:
- token_count (coins icon) = only completion tokens generated by the model
- context_tokens (ContextBar) = only prompt tokens (context size sent to LLM)
This gives users a realistic measure of how much the model actually generated.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|

Migrate MCP tool naming from mcp:server:tool to mcp__server__tool
...
The colon separator (mcp:server:tool) confuses many LLMs during
tool-calling because colons appear in schemas and URLs. Switch to
double-underscore separator (mcp__server__tool) for robust parsing.
Key changes:
- navi/mcp/tools.py: add build_mcp_name(), parse_mcp_name(), is_mcp_tool()
- navi/core/tool_executor.py: update _resolve_tool() with new helpers
and legacy colon fallback for old sessions
- navi/core/tool_utils.py, subagent_runner.py: use build_mcp_name()
- navi/api/routes/{admin,agents}.py: prefix via build_mcp_name()
- navi/tools/{list_tools,reload_tools}.py: migrated
- All profile configs + system_prompt.txt: replace mcp: with mcp__
- manuals/{model_3d,lint_scad,render_3d,spawn_agent}.md: updated
- mcp_servers.d/gnexus-book.json: instructions updated
- docs/{api,profiles,tools,mechanics,visual.html}: updated
- tests: test_tool_executor.py and test_mcp.py aligned
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
docs/profiles: document subagent_tools MCP filtering behavior
...
Clarify that subagent_tools acts as a whitelist for MCP tools:
- Only mcp: entries explicitly listed in subagent_tools are exposed to sub-agents.
- If subagent_tools is non-empty and contains no mcp: entries, the sub-agent
receives no MCP tools at all.
Eugene Sukhodolskiy
committed
on 21 May
|
SubAgentRunner: filter mcp_servers against subagent_tools whitelist
...
When a profile defines subagent_tools (strict whitelist for sub-agents),
MCP servers were still expanded unconditionally, granting sub-agents access
to MCP tools not listed in the whitelist. Now:
- If subagent_tools contains mcp:xxx entries, only those specific MCP tools
are passed to build_tool_list.
- If subagent_tools is non-empty but contains no mcp: entries, mcp_servers
is set to None — sub-agents get no MCP tools at all.
- If subagent_tools is empty (fallback to enabled_tools), full mcp_servers
is kept for backward compatibility.
400 passed, 1 skipped
Eugene Sukhodolskiy
committed
on 21 May
|
ArtifactsPanel: two-pass link extraction — markdown links win over bare URLs
...
Markdown links carry a title/caption, so they should always take priority
over bare URLs during deduplication. Previously a bare URL from a newer
message could shadow an older markdown link. Now markdown links are
scanned in a separate first pass, so they always win the dedup race.
Eugene Sukhodolskiy
committed
on 21 May
|
ArtifactsPanel: strip trailing punctuation from URLs for dedup
...
Bare URLs sometimes pick up a trailing ) or . from surrounding markdown
(e.g. model writes a link inside parentheses). The normalizeUrlForDedup()
helper strips trailing punctuation before checking the seen Set so
https://a/b and https://a/b) are treated as the same link.
Eugene Sukhodolskiy
committed
on 21 May
|
FallbackOllamaBackend: do not blacklist single server, empty file fallback
...
- When only one Ollama server is configured, LLMConnectionError no longer
adds it to the dead-server blacklist. This fixes the bug where a
transient failure permanently blocked all requests until server restart.
- LLMModelNotFoundError on a single server is also not blacklisted.
- _discover_backends now falls back to settings.ollama_host when the
ollama_backends_file is empty, missing, or returns no valid servers.
- Added unit tests covering single-server no-blacklist, multi-server
blacklist, model-not-found no-blacklist, and empty-file fallback.
400 passed, 1 skipped
Eugene Sukhodolskiy
committed
on 21 May
|
McpTool: auto-inject session_id + normalize navi-3d paths
...
- McpTool.execute() now forces the real session_id from current_session_id
ContextVar, preventing LLM hallucinations of wrong UUIDs (ghost-session bug).
- For navi-3d MCP server, source_path/output_path are normalized to basename
to prevent double path nesting when the LLM passes full relative paths.
- Updated MCP tool descriptions to ask for filenames only.
- Added system prompt instructions in context_builder and subagent_runner
reminding the model to pass bare filenames to navi-3d tools.
396 passed, 1 skipped
Eugene Sukhodolskiy
committed
on 21 May
|
| 2026-05-20 |
Fix UnboundLocalError: create mcp_manager before build_default_registries
...
The previous commit passed mcp_manager to build_default_registries but
left the instantiation after the call, causing UnboundLocalError at
runtime. Move McpManager() creation before the registry build and
remove the now-obsolete post-hoc _mcp_manager patching loop.
392 passed, 1 skipped
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 20 May
|
| 2026-05-18 |
Mark architecture weak spot #10 (MCP caching/backoff) as resolved
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 18 May
|
MCP: cache config in McpManager, add exponential backoff to McpClient reconnect
...
McpManager:
- Cache loaded configs in self._configs (loaded once at load_all)
- resolve_group() and get_instructions() read from cache instead of disk
- reload_all() busts the cache before re-reading
- Fallback to disk when cache is empty (tests / first call without load_all)
McpClient:
- Exponential backoff on reconnect: base 1s, max 30s, ±20% jitter
- Backoff resets on successful connect, doubles on failure
- _ensure_connected() blocks reconnect if within backoff window
- Prevents thundering herd against a flapping MCP server
392 passed, 1 skipped
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 18 May
|
Mark architecture weak spot #7 (DRY tool_executor) as resolved
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 18 May
|
DRY: unify tool execution in ToolExecutor._execute_one()
...
Three methods (_run_single_tool, _execute_tool_calls,
_execute_tool_calls_streaming) duplicated identical logic:
resolve → middleware → execute → image extraction → build message.
- Extract canonical _execute_one(tc, tool_map) -> (ToolEvent, Message, image_msg)
- All three public methods now delegate to _execute_one
- Public signatures unchanged — no test or caller changes needed
392 passed, 1 skipped
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 18 May
|