| 2026-04-16 |
Add profile discoverability: list_profiles tool + system prompt injection
...
- AgentProfile: new short_description (1-line) and full_description (dict
with specialization / when_to_use / key_tools) fields
- All 3 profile configs: structured descriptions added; list_profiles added
to enabled_tools
- _build_system_prompt: now accepts full AgentProfile; injects compact
"Available profiles" block into every system prompt so Navi always knows
what other profiles exist and when to switch — dynamically, no hardcoding
- ListProfilesTool: new built-in; returns structured per-profile details
(specialization, when_to_use, key_tools); accepts optional profile_id
for single-profile lookup
- registry: register list_profiles_tool after profiles registry is built
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
Fix metrics: net token delta, subagent aggregation, ContextBar always visible
...
- run_stream: track _prev_tokens baseline before turn loop; compute net
token cost as (context_tokens - prev) + subagent_tokens for per-message cost
- run_stream: intercept SubagentComplete in sink drain loop to accumulate
subagent token and tool-call counts into the parent turn's totals
- run_ephemeral: already emitting SubagentComplete (from prior session)
- msg-meta-row: remove margin-left:auto from .msg-meta-time so time
groups inline with elapsed/tools/tokens instead of floating right
- ContextBar: remove v-if guard so bar is always visible (not only after
first LLM response with token data)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
Add response metrics: elapsed time, tool calls, token count
...
Server:
- Message model: elapsed_seconds, tool_call_count, token_count fields
(display-only, excluded from LLM context via exclude_none)
- StreamEnd event: carries same three fields
- agent.run_stream: tracks turn start time, counts ToolEvent completions,
writes metrics onto the final assistant Message before saving to DB
- WebSocket: forwards metrics in stream_end payload
Client:
- chat.onStreamEnd: attaches elapsed_seconds, tool_call_count, token_count
to the streaming message on completion
- buildMessageList: scans each assistant group for metrics from history
- AssistantMessage: renders .msg-meta-row below the response —
timer icon + Xs · wrench icon + N tools · coins icon + Nk tokens · time
(each item only shown if present; time pushed right via margin-left: auto)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
Persist thinking and plan cards across session reloads
...
- Message: add thinking and is_plan fields (display-only, not sent to LLM)
- Agent main loop: accumulate thinking per iteration, save with assistant message
- _run_planning: also append plan to session.messages with is_plan=True so UI
can render plan cards after page reload (context already had the plan)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
| 2026-04-15 |
Fix Ollama connection leak and empty message bug in agent
...
- _iter_stream_guarded: track chunk_task as nullable, cancel in finally
block to prevent zombie HTTP connections accumulating under load
- Final turn: use `content or None` so empty text isn't saved to DB
- client/index.html: point to new Vue webclient build
- profiles: add email_manager tool
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 15 Apr
|
| 2026-04-14 |
Improve filesystem, web search, context guard, and subagent narration
...
filesystem: add find (glob), info (stat), move, append actions; read now
supports offset/limit with hard 1MB guard; list shows sizes, dates,
optional recursion.
web_search: retry DDG across auto/html/lite backends; add optional Brave
Search API and SearXNG fallbacks configured via .env.
agent: fix ContextTooLargeError to surface as Navi response instead of
raw system error; fix _check_context_size to calculate from remaining
budget (window - output_reserve) rather than a fixed 92% threshold.
persona: add ReAct narration instruction to subagent briefing template.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|

Fix LLM hang: stop button during prefill, context guard, timeouts
...
Root cause: during prefill (processing input tokens), Ollama emits no
HTTP chunks. The `async for chunk in stream_complete()` loop body never
executes, so stop_event is never checked — Stop button has no effect.
Same issue with complete() calls (planning, compression): blocking await
with no cancellation path.
Fixes:
_iter_stream_guarded() (agent.py, module-level):
Wraps any stream_complete() generator. Polls stop_event every 1s while
waiting for the next chunk using asyncio.wait() — so Stop works even
during multi-minute prefill. On stop or timeout, calls aclose() on the
generator which closes the HTTP connection to Ollama → generation halts
→ GPU drops to idle. Applied to both run_stream() and run_ephemeral().
_check_context_size() (Agent method):
Estimates context tokens (chars/4 + 500 per image) before every LLM
call. Raises ContextTooLargeError (new NaviError subclass) at 92% of
ollama_num_ctx — before Ollama ever receives the request.
_run_planning() timeouts:
Both complete() calls (phase 1 and 2) wrapped with asyncio.wait_for().
Timeout logged and planning skipped gracefully — execution continues.
New config (config.py):
llm_complete_timeout = 120s
llm_stream_first_chunk_timeout = 180s (prefill budget)
llm_stream_chunk_timeout = 60s (inter-token budget)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|

Improve planning: two-phase pipeline and orchestrator discipline
...
agent.py:
- _run_planning() now runs two sequential LLM calls:
Phase 1 (analysis): reformulate task, identify subtasks and unknowns;
skip immediately if DIRECT.
Phase 2 (execution plan): assign each subtask an executor —
TOOL/AGENT/SELF — using a structured ## Plan format.
Phase 2 context = analysis (embedded in system prompt) + last user
message only; full history excluded to keep focus on plan structure.
- Warn in logs when plan lacks TOOL/AGENT/SELF executor assignments.
persona.txt:
- MANDATORY sequence: step 0 = scratchpad init before anything else;
todo tasks must mirror plan steps exactly (same order, same executors).
- PLAN → EXECUTION BINDING: explicit rule — never switch an AGENT step
to inline execution silently.
- SCRATCHPAD: initialize sections at task start, not after first tool call;
write context to scratchpad before briefing subagents.
- Fix typo in BRIEFING ("sub-lagent" → "sub-agent").
- Replace stale Knowledge Retrieval Protocol with accurate one-liner.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|
| 2026-04-11 |

Strengthen Navi planning/delegation, unify toolsets, isolate subagent scratchpad
...
persona.txt:
- DELEGATION: 'default to spawning, not to doing inline' — stronger default,
clearer triggers, explicit when-not-to-spawn rules
- PLANNING: ties automatic planning phase to mandatory todo(op='set') as first
tool call; reconciles pre-loop plan with in-loop execution discipline
- SCRATCHPAD: new section — when to write, section naming conventions,
mandatory read before final answer
Profiles (secretary, server_admin, smart_home):
- All three now share the same 18-tool set (each file independent)
- planning_enabled=True on all three
- scratchpad and web_search added to smart_home
- System prompts updated with scratchpad/todo execution discipline sections
agent.py run_ephemeral:
- Each subagent gets a unique session ID (subagent_<uuid>) for scratchpad
isolation — parallel or sequential subagents no longer share working notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 Apr
|
Skip planning phase for simple/direct requests
...
The planning prompt now asks the model to respond with "DIRECT" if the
request doesn't need multiple steps. Added a regex fallback: if the
response has no numbered steps it's also discarded. This prevents plan
cards appearing for conversational replies that would just duplicate
the final message.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 Apr
|
Add planning phase and scratchpad tool for smarter task execution
...
- ScratchpadTool: session-scoped working notepad with named sections
(write/append/read/clear). Lets Navi capture intermediate findings
between tool calls instead of losing track of them.
- Planning phase: when profile.planning_enabled=True, a fast pre-loop
LLM call (think=False, no tools) outlines a numbered plan before
any actions are taken. The plan is injected into session context as
an assistant message so the model naturally continues from it.
- PlanReady event + plan_ready WebSocket message + plan card in UI
(green-tinted, collapsible, mirroring thinking card design).
- secretary and server_admin profiles: planning_enabled=True,
scratchpad added to enabled_tools, system prompts updated with
explicit execution discipline instructions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 Apr
|
| 2026-04-10 |

Add stop button and fix context compression hang
...
Stop generation:
- Client: send button toggles to red ■ during streaming; sends {type:stop} via WS
- Server: _stream_recv concurrently reads incoming messages during streaming using
asyncio.wait — stop signal is handled immediately without polling
- Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks
out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully,
model stays in VRAM. No task.cancel() which would eject the model.
- StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop
via the same shared stop_event inherited through task context
Context compression fix:
- compress_context passes think=False to llm.complete() — no extended reasoning
during summarization which caused GPU hang
- Input truncated to 12k chars before sending to summarizer
- LLMBackend.complete() / OllamaBackend.complete() accept think: bool | None override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
Fix profile switch: reload tools/schema after switch_profile tool call
...
switch_profile updates profile_id in DB, but run_stream() held a stale
local session object — the final save would overwrite the change, and
subsequent LLM calls in the same turn still used the old tool schemas.
After each tool-call iteration, compare DB profile_id with the local
session object. On mismatch: update session.profile_id, reload profile,
tools, tool_schemas, and llm backend so the next LLM call gets the
correct schema and the final save preserves the new profile.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
Dynamic system prompt — inject per-call instead of storing in context
...
System prompt is no longer stored in session.context. Instead,
_build_context() prepends the current profile's system prompt fresh on
every LLM call. This means profile switches take effect immediately on
the next message — no stale prompt lingering in stored context.
Also strips any existing system messages from context for migration
safety (old sessions that have one stored will still work).
_with_memory() removed, replaced by _build_context(context, profile, mem).
run_ephemeral() context no longer includes system message either.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|

Major feature batch: visibility, planning, file uploads, streaming
...
- stream_complete(): streaming with tools for all LLM turns — thinking
now streams as ThinkingDelta/ThinkingEnd in real-time during tool-
selection turns, not just on the final response
- todo built-in tool: session-scoped plan manager (set/view/update/clear);
persona + all profiles updated with mandatory planning instructions
- TurnThinking event: sub-agent thinking forwarded to parent sink as a
collapsible block in the spawn_agent card
- File uploads: non-image files uploaded via XHR, shown as badges in
message bubble; SVG treated as regular file (not base64 image)
- session_files: POST /sessions/{id}/files, TTL cleanup, forbidden exts
- WebSocket reconnect: _AgentRun broadcast pattern, re-attach mid-stream
- UI: favicon, sidebar logo, turn-thinking cards, subagent thinking blocks,
token counter, draft persistence, file progress bar
- Removed AgentNote (content is always None alongside tool_calls)
- Ollama stream_complete: tool_calls captured from non-final chunk (done=False)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
| 2026-04-09 |

Live tool visibility: pending cards, sub-agent step log
...
Backend:
- ToolStarted event: emitted before tool execution begins so client
can render a pending card with spinner immediately
- ToolEvent gains is_subagent flag; ToolStarted same
- current_event_sink ContextVar in tools/base.py — run_stream() sets it
to an asyncio.Queue before create_task(); run_ephemeral() reads it and
puts ToolStarted/ToolEvent into the queue as each sub-agent step runs
- run_stream() tool loop: sequential execution via create_task() +
polling drain loop (20ms sleep); yields ToolStarted → sub-agent events
from sink → ToolEvent (completed) for each tool call
- run_ephemeral() rewritten to inline sequential tool execution with
sink emission (replaces _execute_tool_calls gather)
- _run_single_tool() helper extracted for run_stream()
- websocket.py handles tool_started and adds is_subagent to tool_call
Frontend:
- appendPendingToolCard(): creates card with spinner; spawn_agent opens
body immediately to show sub-agent log as it fills
- finalizeToolCard(): fills result, removes spinner, adds toggle; strips
"[Sub-agent result — ...]" reminder prefix from displayed text
- appendSubagentStep() / finalizeSubagentStep(): live step log inside
spawn_agent card — each sub-agent tool call gets a ↳ row
- app.js: tool_started → pending card; tool_call → finalize card;
is_subagent routing to sub-step vs main card; abandonStream() resets
pendingToolCard/pendingSubStep
- CSS: .spinner-inline for card headers; .subagent-log / .subagent-step
for nested step display; .tool-body-open for always-open spawn_agent
body; .tool-card.pending suppresses chevron
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
Add spawn_agent: sub-agent delegation with isolated context
...
- Agent.run_ephemeral() — runs a sub-agent loop without a persistent
session; accepts exclude_tools to block recursion; logs start/complete
- session_store made Optional in Agent.__init__ (None for ephemeral runs)
- SpawnAgentTool (navi/tools/spawn_agent.py): spawns an isolated Agent
for a focused task; resolves profile from parent session via ContextVar;
blocks spawn_agent recursion via exclude_tools=["spawn_agent"]
- build_default_registries() accepts session_store param; registers
SpawnAgentTool after BackendRegistry is built (patches _backend_registry)
- deps.py passes _session_store to build_default_registries
- All profiles: spawn_agent added to enabled_tools, max_iterations 10→30
- persona.txt: DELEGATION section — when/how to use spawn_agent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
SSH connection pooling: per-session, 20-minute TTL
...
- Pool keyed by session_id:host:port:username — parallel sessions share
no state even when targeting the same server
- Per-key asyncio.Lock prevents concurrent connection creation races
- TTL (20 min) and is_closing() checked on every access; expired/closed
connections are evicted and replaced transparently
- On disconnect error during command execution: evict + retry once with
fresh connection
- current_session_id ContextVar (set by Agent before tool calls) is read
in ssh_exec to build the pool key without changing tool signatures
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|

Add long-term user memory system
...
Architecture:
- navi/memory/store.py: MemoryStore backed by SQLite (memory_facts,
memory_summary, session_memory_state tables in navi.db)
- navi/memory/extractor.py: LLM-based fact extraction from sessions +
summary regeneration (triggered after session goes idle >30 min)
- Fact upsert uses UNIQUE(category, key) — same key always overwrites,
no duplicates or stale contradictions
- Keyword search across category + key + value (LIKE-based, no extra deps)
Context injection:
- Memory summary injected as an ephemeral system message on every LLM call
via Agent._with_memory() — never persisted to session.context
Tools (all profiles):
- memory_search(query): keyword search against fact DB; persona instructs
model to call it at session start and before personal-context questions
- memory_forget(key, category?): delete a specific fact on user request
Extraction trigger:
- On new session creation, fire-and-forget background task checks all
sessions idle >30 min with unprocessed messages → runs extraction
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
Fix context loss: ensure system prompt is always present in LLM context
...
Replaced `if not session.context:` with a role-based check so the system
message is inserted whenever it is absent — not just for brand-new sessions.
Root cause: backward-compat sessions (context column was empty) had their
context initialised from session.messages, which never contains a system
message. The old check (`if not session.context:`) saw a non-empty list and
skipped the system prompt, so every subsequent request ran without it —
Navi had no persona and no profile instructions.
Also add context_token_count field to Session (follow-up for token counter
fix — persistence wiring comes in next commit).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
| 2026-04-08 |
Review fixes: events module, circular imports, deps, vision-aware compression
...
- Extract all AgentEvent dataclasses to navi/core/events.py; import from
there in agent.py and __init__.py — eliminates circular import between
workers and core
- workers/compressor.py: remove runtime import hack, use navi.core.events
- workers/base.py: WorkerResult.events typed as list[AgentEvent] (was Any)
- api/deps.py: replace @lru_cache on mutable list with module-level
singletons (_registries, _workers)
- core/compressor.py: _format_for_summary returns (text, images); images
passed to summarization LLM so vision models describe them in summary;
non-vision models silently ignore the images field; docstring updated
- client/js/app.js: add comment explaining is_summary backward compat branch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Separate display history from LLM context; formalize worker system
...
Architecture change:
- session.messages: full display history, never modified by compression
- session.context: what the LLM sees, may be compressed by workers
- System messages go only into context (not display history)
- Image injections (synthetic) go only into context
- User/assistant/tool messages go into both
SQLite: add context column with backward-compat migration
(empty context → initialized from messages on load)
Workers (navi/workers/):
- Worker ABC + WorkerContext + WorkerResult (base.py)
- CompressionWorker: compresses session.context when above threshold
- build_default_workers() returns [CompressionWorker()]
- Agent accepts workers list, runs them after StreamEnd
- Workers injected via deps.py get_workers() (lru_cached singleton)
- WebSocket agent construction also receives workers
Compressor: compress_context() now takes context[], not messages[]
Config: context_keep_recent 6 → 10
Agent: _run_workers() collects events from all workers and yields them
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Add context compression: rolling summarization when context fills up
...
Mechanism:
- After streaming ends, if context_tokens >= threshold (80% of num_ctx),
compress old turns into a summary message using the same LLM
- Partition: keep system msg + last N turns verbatim (default 6);
everything older goes to the summarizer
- Tool call groups (assistant + tool results) never split across boundary
- Existing summary messages folded into new compression pass — no stack growth
- Summary stored as Message(role=user, is_summary=True) after system msg
- On failure: logged, session left unchanged (non-fatal)
New files:
- navi/core/compressor.py: should_compress, partition_messages,
compress_session (pure logic, testable without agent)
New config (navi/config.py):
- context_compression_enabled: bool = True
- context_compression_threshold: float = 0.80
- context_keep_recent: int = 6
- context_summary_temperature: float = 0.3
New agent event: ContextCompressed(messages_before, messages_after)
Message.is_summary: bool field marks compressed history blocks
Client:
- context_compressed WS event → subtle inline notice in message list
- loadHistory: is_summary messages rendered as collapsible summary cards
- style.css: .summary-card, .compression-notice
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add context token counter: 64k default, live UI display
...
- config: ollama_num_ctx default 8192 → 65536
- LLMChunk: add prompt_tokens / completion_tokens fields
- OllamaBackend.stream: populate token counts from final chunk
(prompt_eval_count + eval_count when chunk.done)
- StreamEnd: add context_tokens and max_context_tokens
- Agent.run_stream: capture token counts, pass to StreamEnd
- websocket: include context_tokens / max_context_tokens in stream_end
- index.html: split chat-header into title span + token-counter span
- sidebar.js: updateChatHeader targets #chat-header-title, not innerHTML
- app.js: updateTokenCounter() shows "X/Y (Z%) tokens", colors:
gray <50%, amber 50–79%, red ≥80%
- style.css: .token-counter, .warn, .danger styles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Server review fixes: profile model routing, sorting, datetime, cleanup
...
- LLMBackend.complete/stream: add model param; OllamaBackend uses it
over self.model, enabling per-profile model selection
- BackendRegistry.get(): remove unused model param
- Agent: pass profile.model to complete() and stream()
- Profiles: correct model to gemma4:e2b-it-q8_0 (was leftover e4b)
- InMemorySessionStore.list_all(): fix sort (pinned+newest first,
was pinned+oldest) — now consistent with SQLite ORDER BY
- session.py, sqlite_session_store.py: datetime.utcnow() →
datetime.now(timezone.utc) (deprecated since Python 3.12)
- _base_options(): accept temperature param, remove dead default
- deps.py: rename _registries → get_registries (public API)
- websocket.py: update import accordingly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add thinking/reasoning streaming support
...
Enable Ollama think param and stream reasoning chunks to client.
New agent events: ThinkingDelta, ThinkingEnd. Config gains ollama_think
and ollama_num_ctx settings. WebSocket protocol updated accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Add multimodal image support and client UX improvements
...
Server:
- Add ImageViewTool (load image from file/URL, returns base64)
- Add images field to Message model with created_at timestamp
- Agent run/run_stream accept images param; inject image messages after image_view tool calls
- WebSocket handler accepts images array from client, strips data URI prefix
- All profiles include image_view tool
- Fix tool call serialization (model_dump mode=json for datetime)
- Add no-store cache headers for static files
Client:
- Image attachment: file picker button + clipboard paste + preview strip with remove
- Images rendered in chat bubbles; loaded from history
- Tool cards rebuilt as div+CSS toggle (fixes details/overflow-hidden collapse bug)
- Tool cards appear before response bubble (lazy bubble creation on first stream_delta)
- Typing indicator persists through tool calls, removed only when text starts streaming
- Tool cards restored from history on page reload
- Message timestamps stored via created_at field, shown correctly in history
- Session ID reflected in URL hash for bookmarking; restored on page load
- Remove localStorage session tracking (server last_active used instead)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Initial implementation of the agent system core
...
- FastAPI server with REST API and WebSocket streaming
- Modular LLM backend abstraction (Ollama implemented, OpenAI stub)
- Tool system: web_search (ddgs), filesystem, http_request, code_exec, terminal
- Agent profiles: smart_home, server_admin, secretary
- Tool-calling loop with concurrent tool execution
- In-memory session store with SessionStore ABC for future persistence
- Registry pattern for tools, profiles, and backends
- Orchestrator stub as foundation for multi-agent scenarios
Eugene Sukhodolskiy
committed
on 8 Apr
|