| 2026-04-14 |
Add memory_save tool for proactive fact persistence
...
Navi previously had no way to write to memory mid-conversation — she
could only search and forget. Facts were extracted automatically after
sessions went idle for 30+ min, so important context shared by the user
could be lost or delayed.
- New MemorySaveTool (navi/tools/memory_save.py): upsert a fact by
category/key/value; overwrites existing key so no separate forget needed
- Registered as builtin alongside memory_search/memory_forget
- Added to all three profiles (secretary, server_admin, smart_home)
- persona.txt: explicit "call memory_save immediately when..." guidance
so Navi saves stable facts as they arrive, not only post-session
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|
Expose compression summary as collapsible debug card in chat UI
...
ContextCompressed event now carries the full summary text produced by the
LLM. Compression notice in chat becomes a <details> element showing
message count (before→after) with the summary expandable on click.
Rendered as markdown via marked.js.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|
Add share_file tool and session-lifetime file storage
...
Session file directories now live until the session is deleted, not
24h TTL. Cleanup loop only removes orphaned dirs (session gone from DB).
New share_file tool: copies any file to the session directory and returns
a clickable download URL. Navi can call this after generating any file
the user will want to keep.
New GET /sessions/{id}/files/{filename} endpoint serves files with
correct Content-Disposition (inline for images/HTML/PDF, attachment
for everything else).
Added PUBLIC_URL config key for building correct download links behind
reverse proxies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|
Improve filesystem, web search, context guard, and subagent narration
...
filesystem: add find (glob), info (stat), move, append actions; read now
supports offset/limit with hard 1MB guard; list shows sizes, dates,
optional recursion.
web_search: retry DDG across auto/html/lite backends; add optional Brave
Search API and SearXNG fallbacks configured via .env.
agent: fix ContextTooLargeError to surface as Navi response instead of
raw system error; fix _check_context_size to calculate from remaining
budget (window - output_reserve) rather than a fixed 92% threshold.
persona: add ReAct narration instruction to subagent briefing template.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|

Fix LLM hang: stop button during prefill, context guard, timeouts
...
Root cause: during prefill (processing input tokens), Ollama emits no
HTTP chunks. The `async for chunk in stream_complete()` loop body never
executes, so stop_event is never checked — Stop button has no effect.
Same issue with complete() calls (planning, compression): blocking await
with no cancellation path.
Fixes:
_iter_stream_guarded() (agent.py, module-level):
Wraps any stream_complete() generator. Polls stop_event every 1s while
waiting for the next chunk using asyncio.wait() — so Stop works even
during multi-minute prefill. On stop or timeout, calls aclose() on the
generator which closes the HTTP connection to Ollama → generation halts
→ GPU drops to idle. Applied to both run_stream() and run_ephemeral().
_check_context_size() (Agent method):
Estimates context tokens (chars/4 + 500 per image) before every LLM
call. Raises ContextTooLargeError (new NaviError subclass) at 92% of
ollama_num_ctx — before Ollama ever receives the request.
_run_planning() timeouts:
Both complete() calls (phase 1 and 2) wrapped with asyncio.wait_for().
Timeout logged and planning skipped gracefully — execution continues.
New config (config.py):
llm_complete_timeout = 120s
llm_stream_first_chunk_timeout = 180s (prefill budget)
llm_stream_chunk_timeout = 60s (inter-token budget)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|

Improve planning: two-phase pipeline and orchestrator discipline
...
agent.py:
- _run_planning() now runs two sequential LLM calls:
Phase 1 (analysis): reformulate task, identify subtasks and unknowns;
skip immediately if DIRECT.
Phase 2 (execution plan): assign each subtask an executor —
TOOL/AGENT/SELF — using a structured ## Plan format.
Phase 2 context = analysis (embedded in system prompt) + last user
message only; full history excluded to keep focus on plan structure.
- Warn in logs when plan lacks TOOL/AGENT/SELF executor assignments.
persona.txt:
- MANDATORY sequence: step 0 = scratchpad init before anything else;
todo tasks must mirror plan steps exactly (same order, same executors).
- PLAN → EXECUTION BINDING: explicit rule — never switch an AGENT step
to inline execution silently.
- SCRATCHPAD: initialize sections at task start, not after first tool call;
write context to scratchpad before briefing subagents.
- Fix typo in BRIEFING ("sub-lagent" → "sub-agent").
- Replace stale Knowledge Retrieval Protocol with accurate one-liner.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|
| 2026-04-11 |

Strengthen Navi planning/delegation, unify toolsets, isolate subagent scratchpad
...
persona.txt:
- DELEGATION: 'default to spawning, not to doing inline' — stronger default,
clearer triggers, explicit when-not-to-spawn rules
- PLANNING: ties automatic planning phase to mandatory todo(op='set') as first
tool call; reconciles pre-loop plan with in-loop execution discipline
- SCRATCHPAD: new section — when to write, section naming conventions,
mandatory read before final answer
Profiles (secretary, server_admin, smart_home):
- All three now share the same 18-tool set (each file independent)
- planning_enabled=True on all three
- scratchpad and web_search added to smart_home
- System prompts updated with scratchpad/todo execution discipline sections
agent.py run_ephemeral:
- Each subagent gets a unique session ID (subagent_<uuid>) for scratchpad
isolation — parallel or sequential subagents no longer share working notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 Apr
|
Skip planning phase for simple/direct requests
...
The planning prompt now asks the model to respond with "DIRECT" if the
request doesn't need multiple steps. Added a regex fallback: if the
response has no numbered steps it's also discarded. This prevents plan
cards appearing for conversational replies that would just duplicate
the final message.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 Apr
|
Add planning phase and scratchpad tool for smarter task execution
...
- ScratchpadTool: session-scoped working notepad with named sections
(write/append/read/clear). Lets Navi capture intermediate findings
between tool calls instead of losing track of them.
- Planning phase: when profile.planning_enabled=True, a fast pre-loop
LLM call (think=False, no tools) outlines a numbered plan before
any actions are taken. The plan is injected into session context as
an assistant message so the model naturally continues from it.
- PlanReady event + plan_ready WebSocket message + plan card in UI
(green-tinted, collapsible, mirroring thinking card design).
- secretary and server_admin profiles: planning_enabled=True,
scratchpad added to enabled_tools, system prompts updated with
explicit execution discipline instructions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 Apr
|
| 2026-04-10 |

Add stop button and fix context compression hang
...
Stop generation:
- Client: send button toggles to red ■ during streaming; sends {type:stop} via WS
- Server: _stream_recv concurrently reads incoming messages during streaming using
asyncio.wait — stop signal is handled immediately without polling
- Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks
out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully,
model stays in VRAM. No task.cancel() which would eject the model.
- StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop
via the same shared stop_event inherited through task context
Context compression fix:
- compress_context passes think=False to llm.complete() — no extended reasoning
during summarization which caused GPU hang
- Input truncated to 12k chars before sending to summarizer
- LLMBackend.complete() / OllamaBackend.complete() accept think: bool | None override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
Fix save(): persist profile_id to DB
...
profile_id was never included in the UPDATE statement — only set on
initial INSERT. Profile switches appeared to work in-memory but reverted
on page reload or server restart.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
Fix profile switch: reload tools/schema after switch_profile tool call
...
switch_profile updates profile_id in DB, but run_stream() held a stale
local session object — the final save would overwrite the change, and
subsequent LLM calls in the same turn still used the old tool schemas.
After each tool-call iteration, compare DB profile_id with the local
session object. On mismatch: update session.profile_id, reload profile,
tools, tool_schemas, and llm backend so the next LLM call gets the
correct schema and the final save preserves the new profile.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
Dynamic system prompt — inject per-call instead of storing in context
...
System prompt is no longer stored in session.context. Instead,
_build_context() prepends the current profile's system prompt fresh on
every LLM call. This means profile switches take effect immediately on
the next message — no stale prompt lingering in stored context.
Also strips any existing system messages from context for migration
safety (old sessions that have one stored will still work).
_with_memory() removed, replaced by _build_context(context, profile, mem).
run_ephemeral() context no longer includes system message either.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
Profile switch: emit WS event so client updates UI immediately
...
ProfileSwitched event emitted by switch_profile tool via current_event_sink.
Client handles profile_switched: updates chat header, profile selector,
and local sessions[] — no page refresh needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
Add switch_profile tool for automatic profile switching
...
Navi can now switch her own profile mid-session when the task domain
changes. The new profile (tools + system prompt) takes effect from the
next user message. Injected with session_store + profile_registry like
SpawnAgentTool. Added to all profiles' enabled_tools and persona.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|

Major feature batch: visibility, planning, file uploads, streaming
...
- stream_complete(): streaming with tools for all LLM turns — thinking
now streams as ThinkingDelta/ThinkingEnd in real-time during tool-
selection turns, not just on the final response
- todo built-in tool: session-scoped plan manager (set/view/update/clear);
persona + all profiles updated with mandatory planning instructions
- TurnThinking event: sub-agent thinking forwarded to parent sink as a
collapsible block in the spawn_agent card
- File uploads: non-image files uploaded via XHR, shown as badges in
message bubble; SVG treated as regular file (not base64 image)
- session_files: POST /sessions/{id}/files, TTL cleanup, forbidden exts
- WebSocket reconnect: _AgentRun broadcast pattern, re-attach mid-stream
- UI: favicon, sidebar logo, turn-thinking cards, subagent thinking blocks,
token counter, draft persistence, file progress bar
- Removed AgentNote (content is always None alongside tool_calls)
- Ollama stream_complete: tool_calls captured from non-final chunk (done=False)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
| 2026-04-09 |

Live tool visibility: pending cards, sub-agent step log
...
Backend:
- ToolStarted event: emitted before tool execution begins so client
can render a pending card with spinner immediately
- ToolEvent gains is_subagent flag; ToolStarted same
- current_event_sink ContextVar in tools/base.py — run_stream() sets it
to an asyncio.Queue before create_task(); run_ephemeral() reads it and
puts ToolStarted/ToolEvent into the queue as each sub-agent step runs
- run_stream() tool loop: sequential execution via create_task() +
polling drain loop (20ms sleep); yields ToolStarted → sub-agent events
from sink → ToolEvent (completed) for each tool call
- run_ephemeral() rewritten to inline sequential tool execution with
sink emission (replaces _execute_tool_calls gather)
- _run_single_tool() helper extracted for run_stream()
- websocket.py handles tool_started and adds is_subagent to tool_call
Frontend:
- appendPendingToolCard(): creates card with spinner; spawn_agent opens
body immediately to show sub-agent log as it fills
- finalizeToolCard(): fills result, removes spinner, adds toggle; strips
"[Sub-agent result — ...]" reminder prefix from displayed text
- appendSubagentStep() / finalizeSubagentStep(): live step log inside
spawn_agent card — each sub-agent tool call gets a ↳ row
- app.js: tool_started → pending card; tool_call → finalize card;
is_subagent routing to sub-step vs main card; abandonStream() resets
pendingToolCard/pendingSubStep
- CSS: .spinner-inline for card headers; .subagent-log / .subagent-step
for nested step display; .tool-body-open for always-open spawn_agent
body; .tool-card.pending suppresses chevron
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
Add spawn_agent: sub-agent delegation with isolated context
...
- Agent.run_ephemeral() — runs a sub-agent loop without a persistent
session; accepts exclude_tools to block recursion; logs start/complete
- session_store made Optional in Agent.__init__ (None for ephemeral runs)
- SpawnAgentTool (navi/tools/spawn_agent.py): spawns an isolated Agent
for a focused task; resolves profile from parent session via ContextVar;
blocks spawn_agent recursion via exclude_tools=["spawn_agent"]
- build_default_registries() accepts session_store param; registers
SpawnAgentTool after BackendRegistry is built (patches _backend_registry)
- deps.py passes _session_store to build_default_registries
- All profiles: spawn_agent added to enabled_tools, max_iterations 10→30
- persona.txt: DELEGATION section — when/how to use spawn_agent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
SSH connection pooling: per-session, 20-minute TTL
...
- Pool keyed by session_id:host:port:username — parallel sessions share
no state even when targeting the same server
- Per-key asyncio.Lock prevents concurrent connection creation races
- TTL (20 min) and is_closing() checked on every access; expired/closed
connections are evicted and replaced transparently
- On disconnect error during command execution: evict + retry once with
fresh connection
- current_session_id ContextVar (set by Agent before tool calls) is read
in ssh_exec to build the pool key without changing tool signatures
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
Fix naive/aware datetime comparison in session store and memory extraction
...
Old sessions stored datetimes without timezone offset. _row_to_session now
always returns timezone-aware datetimes via _parse_dt() helper, fixing the
TypeError when comparing session.last_active against timezone.utc cutoffs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|

Add long-term user memory system
...
Architecture:
- navi/memory/store.py: MemoryStore backed by SQLite (memory_facts,
memory_summary, session_memory_state tables in navi.db)
- navi/memory/extractor.py: LLM-based fact extraction from sessions +
summary regeneration (triggered after session goes idle >30 min)
- Fact upsert uses UNIQUE(category, key) — same key always overwrites,
no duplicates or stale contradictions
- Keyword search across category + key + value (LIKE-based, no extra deps)
Context injection:
- Memory summary injected as an ephemeral system message on every LLM call
via Agent._with_memory() — never persisted to session.context
Tools (all profiles):
- memory_search(query): keyword search against fact DB; persona instructs
model to call it at session start and before personal-context questions
- memory_forget(key, category?): delete a specific fact on user request
Extraction trigger:
- On new session creation, fire-and-forget background task checks all
sessions idle >30 min with unprocessed messages → runs extraction
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
Add web_view tool: headless browser with text extraction and screenshot
...
- New built-in tool web_view: opens URL in headless Chromium via Playwright,
strips nav/footer/scripts, returns clean readable text (capped at 20k chars).
Optional screenshot=true returns a PNG injected into context as an image.
Handles JS-rendered pages and SPAs (waits for networkidle by default).
- http_request description updated: explicitly says to use web_view for human-
readable pages, http_request for APIs/JSON/custom auth.
- web_view added to all three profiles.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
Fix context loss: ensure system prompt is always present in LLM context
...
Replaced `if not session.context:` with a role-based check so the system
message is inserted whenever it is absent — not just for brand-new sessions.
Root cause: backward-compat sessions (context column was empty) had their
context initialised from session.messages, which never contains a system
message. The old check (`if not session.context:`) saw a non-empty list and
skipped the system prompt, so every subsequent request ran without it —
Navi had no persona and no profile instructions.
Also add context_token_count field to Session (follow-up for token counter
fix — persistence wiring comes in next commit).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
| 2026-04-08 |
Review fixes: events module, circular imports, deps, vision-aware compression
...
- Extract all AgentEvent dataclasses to navi/core/events.py; import from
there in agent.py and __init__.py — eliminates circular import between
workers and core
- workers/compressor.py: remove runtime import hack, use navi.core.events
- workers/base.py: WorkerResult.events typed as list[AgentEvent] (was Any)
- api/deps.py: replace @lru_cache on mutable list with module-level
singletons (_registries, _workers)
- core/compressor.py: _format_for_summary returns (text, images); images
passed to summarization LLM so vision models describe them in summary;
non-vision models silently ignore the images field; docstring updated
- client/js/app.js: add comment explaining is_summary backward compat branch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Separate display history from LLM context; formalize worker system
...
Architecture change:
- session.messages: full display history, never modified by compression
- session.context: what the LLM sees, may be compressed by workers
- System messages go only into context (not display history)
- Image injections (synthetic) go only into context
- User/assistant/tool messages go into both
SQLite: add context column with backward-compat migration
(empty context → initialized from messages on load)
Workers (navi/workers/):
- Worker ABC + WorkerContext + WorkerResult (base.py)
- CompressionWorker: compresses session.context when above threshold
- build_default_workers() returns [CompressionWorker()]
- Agent accepts workers list, runs them after StreamEnd
- Workers injected via deps.py get_workers() (lru_cached singleton)
- WebSocket agent construction also receives workers
Compressor: compress_context() now takes context[], not messages[]
Config: context_keep_recent 6 → 10
Agent: _run_workers() collects events from all workers and yields them
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Add context compression: rolling summarization when context fills up
...
Mechanism:
- After streaming ends, if context_tokens >= threshold (80% of num_ctx),
compress old turns into a summary message using the same LLM
- Partition: keep system msg + last N turns verbatim (default 6);
everything older goes to the summarizer
- Tool call groups (assistant + tool results) never split across boundary
- Existing summary messages folded into new compression pass — no stack growth
- Summary stored as Message(role=user, is_summary=True) after system msg
- On failure: logged, session left unchanged (non-fatal)
New files:
- navi/core/compressor.py: should_compress, partition_messages,
compress_session (pure logic, testable without agent)
New config (navi/config.py):
- context_compression_enabled: bool = True
- context_compression_threshold: float = 0.80
- context_keep_recent: int = 6
- context_summary_temperature: float = 0.3
New agent event: ContextCompressed(messages_before, messages_after)
Message.is_summary: bool field marks compressed history blocks
Client:
- context_compressed WS event → subtle inline notice in message list
- loadHistory: is_summary messages rendered as collapsible summary cards
- style.css: .summary-card, .compression-notice
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add context token counter: 64k default, live UI display
...
- config: ollama_num_ctx default 8192 → 65536
- LLMChunk: add prompt_tokens / completion_tokens fields
- OllamaBackend.stream: populate token counts from final chunk
(prompt_eval_count + eval_count when chunk.done)
- StreamEnd: add context_tokens and max_context_tokens
- Agent.run_stream: capture token counts, pass to StreamEnd
- websocket: include context_tokens / max_context_tokens in stream_end
- index.html: split chat-header into title span + token-counter span
- sidebar.js: updateChatHeader targets #chat-header-title, not innerHTML
- app.js: updateTokenCounter() shows "X/Y (Z%) tokens", colors:
gray <50%, amber 50–79%, red ≥80%
- style.css: .token-counter, .warn, .danger styles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Server review fixes: profile model routing, sorting, datetime, cleanup
...
- LLMBackend.complete/stream: add model param; OllamaBackend uses it
over self.model, enabling per-profile model selection
- BackendRegistry.get(): remove unused model param
- Agent: pass profile.model to complete() and stream()
- Profiles: correct model to gemma4:e2b-it-q8_0 (was leftover e4b)
- InMemorySessionStore.list_all(): fix sort (pinned+newest first,
was pinned+oldest) — now consistent with SQLite ORDER BY
- session.py, sqlite_session_store.py: datetime.utcnow() →
datetime.now(timezone.utc) (deprecated since Python 3.12)
- _base_options(): accept temperature param, remove dead default
- deps.py: rename _registries → get_registries (public API)
- websocket.py: update import accordingly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add self-extension tool system: write_tool, list_tools, tool_manual
...
- loader.py: module-level format (name/description/parameters/execute)
preferred, class-based as fallback; isolated errors per file
- write_tool: validates + writes tools/name.py, reloads registry,
adds to tools/enabled.json in one call
- list_tools: live tool list from registry (prevents hallucination)
- tool_manual: serves manuals/*.md or auto-generates from schema
- reload_tools: hot-reload without server restart
- registry: registry injection pattern for tools that need it;
_builtin_names set to guard against reload overwriting builtins
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add thinking/reasoning streaming support
...
Enable Ollama think param and stream reasoning chunks to client.
New agent events: ThinkingDelta, ThinkingEnd. Config gains ollama_think
and ollama_num_ctx settings. WebSocket protocol updated accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|