| 2026-04-10 |

Add stop button and fix context compression hang
...
Stop generation:
- Client: send button toggles to red ■ during streaming; sends {type:stop} via WS
- Server: _stream_recv concurrently reads incoming messages during streaming using
asyncio.wait — stop signal is handled immediately without polling
- Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks
out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully,
model stays in VRAM. No task.cancel() which would eject the model.
- StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop
via the same shared stop_event inherited through task context
Context compression fix:
- compress_context passes think=False to llm.complete() — no extended reasoning
during summarization which caused GPU hang
- Input truncated to 12k chars before sending to summarizer
- LLMBackend.complete() / OllamaBackend.complete() accept think: bool | None override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|

Major feature batch: visibility, planning, file uploads, streaming
...
- stream_complete(): streaming with tools for all LLM turns — thinking
now streams as ThinkingDelta/ThinkingEnd in real-time during tool-
selection turns, not just on the final response
- todo built-in tool: session-scoped plan manager (set/view/update/clear);
persona + all profiles updated with mandatory planning instructions
- TurnThinking event: sub-agent thinking forwarded to parent sink as a
collapsible block in the spawn_agent card
- File uploads: non-image files uploaded via XHR, shown as badges in
message bubble; SVG treated as regular file (not base64 image)
- session_files: POST /sessions/{id}/files, TTL cleanup, forbidden exts
- WebSocket reconnect: _AgentRun broadcast pattern, re-attach mid-stream
- UI: favicon, sidebar logo, turn-thinking cards, subagent thinking blocks,
token counter, draft persistence, file progress bar
- Removed AgentNote (content is always None alongside tool_calls)
- Ollama stream_complete: tool_calls captured from non-final chunk (done=False)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
| 2026-04-08 |

Add context compression: rolling summarization when context fills up
...
Mechanism:
- After streaming ends, if context_tokens >= threshold (80% of num_ctx),
compress old turns into a summary message using the same LLM
- Partition: keep system msg + last N turns verbatim (default 6);
everything older goes to the summarizer
- Tool call groups (assistant + tool results) never split across boundary
- Existing summary messages folded into new compression pass — no stack growth
- Summary stored as Message(role=user, is_summary=True) after system msg
- On failure: logged, session left unchanged (non-fatal)
New files:
- navi/core/compressor.py: should_compress, partition_messages,
compress_session (pure logic, testable without agent)
New config (navi/config.py):
- context_compression_enabled: bool = True
- context_compression_threshold: float = 0.80
- context_keep_recent: int = 6
- context_summary_temperature: float = 0.3
New agent event: ContextCompressed(messages_before, messages_after)
Message.is_summary: bool field marks compressed history blocks
Client:
- context_compressed WS event → subtle inline notice in message list
- loadHistory: is_summary messages rendered as collapsible summary cards
- style.css: .summary-card, .compression-notice
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add context token counter: 64k default, live UI display
...
- config: ollama_num_ctx default 8192 → 65536
- LLMChunk: add prompt_tokens / completion_tokens fields
- OllamaBackend.stream: populate token counts from final chunk
(prompt_eval_count + eval_count when chunk.done)
- StreamEnd: add context_tokens and max_context_tokens
- Agent.run_stream: capture token counts, pass to StreamEnd
- websocket: include context_tokens / max_context_tokens in stream_end
- index.html: split chat-header into title span + token-counter span
- sidebar.js: updateChatHeader targets #chat-header-title, not innerHTML
- app.js: updateTokenCounter() shows "X/Y (Z%) tokens", colors:
gray <50%, amber 50–79%, red ≥80%
- style.css: .token-counter, .warn, .danger styles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Server review fixes: profile model routing, sorting, datetime, cleanup
...
- LLMBackend.complete/stream: add model param; OllamaBackend uses it
over self.model, enabling per-profile model selection
- BackendRegistry.get(): remove unused model param
- Agent: pass profile.model to complete() and stream()
- Profiles: correct model to gemma4:e2b-it-q8_0 (was leftover e4b)
- InMemorySessionStore.list_all(): fix sort (pinned+newest first,
was pinned+oldest) — now consistent with SQLite ORDER BY
- session.py, sqlite_session_store.py: datetime.utcnow() →
datetime.now(timezone.utc) (deprecated since Python 3.12)
- _base_options(): accept temperature param, remove dead default
- deps.py: rename _registries → get_registries (public API)
- websocket.py: update import accordingly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add thinking/reasoning streaming support
...
Enable Ollama think param and stream reasoning chunks to client.
New agent events: ThinkingDelta, ThinkingEnd. Config gains ollama_think
and ollama_num_ctx settings. WebSocket protocol updated accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Add multimodal image support and client UX improvements
...
Server:
- Add ImageViewTool (load image from file/URL, returns base64)
- Add images field to Message model with created_at timestamp
- Agent run/run_stream accept images param; inject image messages after image_view tool calls
- WebSocket handler accepts images array from client, strips data URI prefix
- All profiles include image_view tool
- Fix tool call serialization (model_dump mode=json for datetime)
- Add no-store cache headers for static files
Client:
- Image attachment: file picker button + clipboard paste + preview strip with remove
- Images rendered in chat bubbles; loaded from history
- Tool cards rebuilt as div+CSS toggle (fixes details/overflow-hidden collapse bug)
- Tool cards appear before response bubble (lazy bubble creation on first stream_delta)
- Typing indicator persists through tool calls, removed only when text starts streaming
- Tool cards restored from history on page reload
- Message timestamps stored via created_at field, shown correctly in history
- Session ID reflected in URL hash for bookmarking; restored on page load
- Remove localStorage session tracking (server last_active used instead)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Initial implementation of the agent system core
...
- FastAPI server with REST API and WebSocket streaming
- Modular LLM backend abstraction (Ollama implemented, OpenAI stub)
- Tool system: web_search (ddgs), filesystem, http_request, code_exec, terminal
- Agent profiles: smart_home, server_admin, secretary
- Tool-calling loop with concurrent tool execution
- In-memory session store with SessionStore ABC for future persistence
- Registry pattern for tools, profiles, and backends
- Orchestrator stub as foundation for multi-agent scenarios
Eugene Sukhodolskiy
committed
on 8 Apr
|