| 2026-05-25 |
Add session message archive table + global sequence_number tracking
...
Schema:
- session_messages_archive — identical structure, stores old messages.
- sessions.next_sequence — monotonic seq counter per session.
- sessions.archive_threshold — split point between hot and archive.
Behaviour:
- get() / _build_sessions() load only seq >= archive_threshold (hot).
- save() UPDATEs existing rows (seq >= 0) and INSERTs new ones (seq = -1)
with auto-assigned sequence_number = next_sequence, next_sequence+1, ...
- archive_old_messages() moves a batch to archive and bumps the threshold.
This keeps the hot table bounded so list/get RAM stays flat regardless
of total session history size.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 May
|
Phase 2: Dual-write with is_context/is_display flags on Message
...
- Message model gets is_context and is_display bools
- PgSessionStore.save() writes flags directly to session_messages
- Agent sets is_context=False on display-only messages, is_display=False on context-only
- Planning: plan context msg is_display=False, plan marker is_context=False
- Compression: summarized messages get is_context=False, summary added to messages with is_display=False
- Tests updated for extra user display+context messages per turn
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 May
|
| 2026-05-24 |
Raise first-chunk timeout to 90s and retry same server+model before fallback
...
- config.py: llm_stream_first_chunk_timeout 180s → 90s
- fallback.py stream_complete: wrap gen.__anext__() in asyncio.wait_for()
with llm_stream_first_chunk_timeout; on TimeoutError or LLMConnectionError
sleep 2s and retry once on the same server+model before blacklisting/fallback
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 May
|
| 2026-05-21 |
Add structured logging for Ollama chat errors
...
Log model, message count, tools count, and raw error string whenever
self._client.chat() raises an exception. This makes it possible to
reconstruct the exact request payload that triggered a 500 from
Ollama Cloud — critical for diagnosing transient vs systemic failures.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 May
|
FallbackOllamaBackend: do not blacklist single server, empty file fallback
...
- When only one Ollama server is configured, LLMConnectionError no longer
adds it to the dead-server blacklist. This fixes the bug where a
transient failure permanently blocked all requests until server restart.
- LLMModelNotFoundError on a single server is also not blacklisted.
- _discover_backends now falls back to settings.ollama_host when the
ollama_backends_file is empty, missing, or returns no valid servers.
- Added unit tests covering single-server no-blacklist, multi-server
blacklist, model-not-found no-blacklist, and empty-file fallback.
400 passed, 1 skipped
Eugene Sukhodolskiy
committed
on 21 May
|
| 2026-05-15 |

fix(recall): stabilize scheduled callback system and improve UX
...
Backend fixes:
- stop_session now stops headless recall runs via _busy_sessions dict
- _fire_recall sets user ContextVars so tools work correctly
- MaxIterationsReached treated as success, not failure
- skip_next_recall uses GREATEST(trigger_at, now) for overdue recalls
- schedule_recall rejects past trigger times
- timezone offset double-adjustment fixed for aware datetimes
- _fire_recall registers _AgentRun for reconnect/replay support
- session_sync race with stream_start fixed
Frontend improvements:
- Recall banner moved to ChatHeader with live Cancel/Skip buttons
- Recall messages styled with is_recall flag and badge
- Real-time recall updates via WebSocket (recall_update events)
- Recall filter moved to sessions-header as toggle button
- Session list shows clock icon for sessions with pending recall
- Empty state messages for empty/filtered session lists
- Fixed missing api import in ChatHeader.vue
Tests:
- Updated scheduler_loop tests for _busy_sessions dict change
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 15 May
|
| 2026-05-13 |
Persist uploaded files in messages, live file tree updates, and UI polish
...
Backend:
- Add `files` field to `Message` model so uploaded file metadata survives page refresh
- Pass `files` through websocket handler → `agent.run_stream` / `agent.run`
- `list_tools`: make `profile_id` required; return error instead of all-tools fallback
Webclient:
- Call `fetchFiles()` after successful file upload for immediate Files tab update
- Live refresh file tree on filesystem (write/edit/append/mkdir/rm/cp/mv), terminal, and code_exec tool calls
- Add manual refresh button (desktop) and pull-to-refresh (mobile) to Files tab
- Fix live link updates: move regex creation inside per-message loop to avoid lastIndex state leak
- Restore full profile name text next to avatar in ChatHeader; hide avatar in header
- Fix mobile ArtifactsPanel: collapse tab text labels so close button fits with 3 tabs
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 13 May
|
Defensive image cleanup in Ollama backend to prevent 'unknown format' errors
...
- _clean_base64_image() strips data URI prefix and validates non-empty result
- _to_ollama_messages() filters out invalid/empty images before sending to Ollama
- Prevents 500 errors when models receive malformed or unsupported image data
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 13 May
|
| 2026-05-12 |
Remove dead LLMBackend.stream() method
...
The method was defined on all backends (Ollama, FallbackOllama, OpenAI)
and in the base LLMBackend interface, but was never called by agent.py
or messages.py. stream_complete() covers all streaming use cases.
- navi/llm/base.py: remove abstract stream() method
- navi/llm/ollama.py: remove OllamaBackend.stream()
- navi/llm/fallback.py: remove FallbackOllamaBackend.stream()
- navi/llm/openai_backend.py: remove OpenAIBackend.stream()
- docs/tech_debt_review: mark item 54 as fixed
236 tests passing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 12 May
|
| 2026-05-11 |
Fix ollama_backends / FallbackOllamaBackend issues
...
- registry.py: always use FallbackOllamaBackend (unified backend).
Enables model priority lists in all deployments, not just multi-server.
- agent.py: add missing think=profile.think_enabled to run() (REST endpoint).
- compressor.py: fix model param type (str → list[str] | str | None).
- fallback.py: harden load_servers_from_file against missing/bad JSON files
and entries without host. Add clear_blacklists() for manual reset.
- admin.py: add POST /admin/ollama/clear-blacklists endpoint.
- tech_debt_review: document dead stream() methods.
- tests: add tests for single-server fallback, bad file handling,
missing host skipping, and blacklist clearing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 May
|
| 2026-04-30 |
Improve content publishing UX
Eugene Sukhodolskiy
committed
on 30 Apr
|
| 2026-04-29 |
Align Ollama HTTP timeout with LLM timeouts
Eugene Sukhodolskiy
committed
on 29 Apr
|

Stability fixes batch — tech debt review 2026-04-29
...
Critical:
- Concurrent WS run race guard (#1)
- Tool task cancellation on generator teardown (#2)
- StopAsyncIteration kills fallback chain (#3)
- Session loading race with _lastLoadId guard (#4)
- ContentCard .match() crash on non-string result (#5)
- Image data type guard in buildMessageList (#6)
High:
- Cap WS replay buffer at 500 events (#7)
- Deduplicate memory extraction task with asyncio.Lock (#9)
- TTL-based fallback blacklisting (5 min) (#10)
- Subagent tool exception isolation (#11)
- Inline image size/count validation on WS (#12)
- Clean up orphaned file on DB insert failure (#13)
- Deep watch streamingMsg for auto-scroll (#14)
- WS_SCHEME wss:// support for HTTPS (#15)
- Sending guard against duplicate message sends (#16)
- Global unhandledrejection listener in API layer (#17)
Medium:
- Cap planning_logs at 20 entries (#22)
- Store cleanup_loop task reference (#23)
- BaseException → Exception in _run_with_sentinel (#24)
- Propagate SystemExit in agent loop (#25)
- Configurable output_reserve_tokens (#26)
- Always reloadSession on session_sync (#30)
- FIFO queue for confirm dialogs (#31)
- Reset body.overflow on ImageLightbox unmount (#32)
- try/finally in fallback copy (#33)
- _isConnecting guard in WS send() (#34)
Low:
- Lazy-init deps.py singletons (#36)
- Replace __import__ with direct imports (#38)
- Preserve token count 0 in ollama.py (#39)
- Clear orphaned streamingMsg on reconnect reload (#43)
- Escape single quote in UserMessage (#44)
- Polyfill-free findLast replacement (#48)
- Match <table> tags with attributes in markdown (#49)
- Attach copy buttons only when msg.done (#50)
- Fix hasMeta falsy-0 bug (#53)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 29 Apr
|
| 2026-04-28 |
Wire pgvector semantic search into memory system
...
- Add vector(768) column + HNSW index to memory_facts
- Add LLMBackend.embed() with Ollama + fallback implementation
- MemoryStore: cosine-distance search with ILIKE fallback
- New memory tool params: source, confidence, expires_days, source_context
- Update extractor, sqlite_store, deps wiring
- Add pgvector to requirements
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 28 Apr
|
| 2026-04-26 |
changed llm & new ollama param
Eugene Sukhodolskiy
committed
on 26 Apr
|
| 2026-04-25 |
Tune profile sampling configs
Eugene Sukhodolskiy
committed
on 25 Apr
|
| 2026-04-24 |
Set temperature=1.0, top_k=64, top_p=0.95 for all profiles (Google recommended for gemma4)
...
Also fixes discuss profile memory tools: use combined "memory" tool name, not nonexistent split variants.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 Apr
|

Add Ollama multi-server fallback with in-memory blacklisting
...
- New FallbackOllamaBackend (navi/llm/fallback.py): tries servers and
models in priority order; on LLMConnectionError blacklists the server
for the process lifetime, on LLMModelNotFoundError blacklists the
(server, model) pair — eliminates latency from repeated failed probes
- OllamaBackend now raises typed LLMConnectionError / LLMModelNotFoundError
instead of bare LLMBackendError; accepts list[str] | str | None for model
- AgentProfile.model changed from str to list[str] (str auto-normalised);
all profiles updated to ["gemma4:31b-cloud", "gemma4:26b-a4b-it-q4_K_M"]
- New config field OLLAMA_BACKENDS_FILE: path to [{host, api_key?}] JSON;
when set, registry creates FallbackOllamaBackend instead of OllamaBackend
- ollama_backends.json template added (gitignored — contains API key)
- current_model ContextVar type widened to list[str] | str | None
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 Apr
|
| 2026-04-22 |
Support Ollama Cloud API key
Eugene Sukhodolskiy
committed
on 22 Apr
|
| 2026-04-20 |

Autonomous reasoning improvements: budget, anchoring, anti-stall, validation
...
- AgentProfile: per-profile thinking mechanics flags (think_enabled,
iteration_budget_enabled, goal_anchoring, anti_stall, step_validation,
planning_reflect, adaptive_replan) — all profiles updated in config.json
- Iteration budget: inject remaining iterations into context so model knows
when to wrap up; urgency levels at ≤7 and ≤3 remaining
- Goal anchoring: inject original goal + todo state every N iterations to
prevent drift on long tasks
- Anti-stall: two signals — no todo progress for N iterations, or identical
tool calls repeated N times; warning injected into context
- Todo step validation: marking done requires a validation field describing
how result was verified; failed gets a soft nudge with tip for re-planning
- stream_complete: add think param to base class, ollama and openai backends
- Summarizer: raise max_tokens 1024→3000, expand system prompt with
user-preferences section and verbatim-value instructions
- Compression card: persist to session.messages (is_compression flag on
Message), show expandable summary in webclient with markdown body
- ToolResult.to_message_content: always include output on failure so
tracebacks and error details reach the model (fixes silent Error: None)
- Developer profile: fix subagent profile secretary→developer, add write_tool
to subagent_tools, clarify write_tool vs filesystem in system prompt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 20 Apr
|
| 2026-04-16 |
Count AIHelper tokens in session metrics
...
Adds prompt/completion token fields to LLMResponse, populated by
OllamaBackend.complete(). AIHelper emits AIHelperTokensUsed into the
current event sink after each LLM call; run_stream drains it into
_subagent_tokens so AIHelper usage is reflected in the turn token delta.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
Add response metrics: elapsed time, tool calls, token count
...
Server:
- Message model: elapsed_seconds, tool_call_count, token_count fields
(display-only, excluded from LLM context via exclude_none)
- StreamEnd event: carries same three fields
- agent.run_stream: tracks turn start time, counts ToolEvent completions,
writes metrics onto the final assistant Message before saving to DB
- WebSocket: forwards metrics in stream_end payload
Client:
- chat.onStreamEnd: attaches elapsed_seconds, tool_call_count, token_count
to the streaming message on completion
- buildMessageList: scans each assistant group for metrics from history
- AssistantMessage: renders .msg-meta-row below the response —
timer icon + Xs · wrench icon + N tools · coins icon + Nk tokens · time
(each item only shown if present; time pushed right via margin-left: auto)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
Persist thinking and plan cards across session reloads
...
- Message: add thinking and is_plan fields (display-only, not sent to LLM)
- Agent main loop: accumulate thinking per iteration, save with assistant message
- _run_planning: also append plan to session.messages with is_plan=True so UI
can render plan cards after page reload (context already had the plan)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
| 2026-04-15 |
Add explicit output token budget for summarizer (context_summary_max_tokens)
...
Previously there was no num_predict set for the summarization LLM call,
so Ollama used its server default (often 128 tokens — very short summaries).
- Add max_tokens param to LLMBackend.complete() and OllamaBackend (→ num_predict)
- Add context_summary_max_tokens: int = 1024 to config
- Thread it through compress_context() and CompressionWorker
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 15 Apr
|
| 2026-04-10 |

Add stop button and fix context compression hang
...
Stop generation:
- Client: send button toggles to red ■ during streaming; sends {type:stop} via WS
- Server: _stream_recv concurrently reads incoming messages during streaming using
asyncio.wait — stop signal is handled immediately without polling
- Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks
out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully,
model stays in VRAM. No task.cancel() which would eject the model.
- StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop
via the same shared stop_event inherited through task context
Context compression fix:
- compress_context passes think=False to llm.complete() — no extended reasoning
during summarization which caused GPU hang
- Input truncated to 12k chars before sending to summarizer
- LLMBackend.complete() / OllamaBackend.complete() accept think: bool | None override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|

Major feature batch: visibility, planning, file uploads, streaming
...
- stream_complete(): streaming with tools for all LLM turns — thinking
now streams as ThinkingDelta/ThinkingEnd in real-time during tool-
selection turns, not just on the final response
- todo built-in tool: session-scoped plan manager (set/view/update/clear);
persona + all profiles updated with mandatory planning instructions
- TurnThinking event: sub-agent thinking forwarded to parent sink as a
collapsible block in the spawn_agent card
- File uploads: non-image files uploaded via XHR, shown as badges in
message bubble; SVG treated as regular file (not base64 image)
- session_files: POST /sessions/{id}/files, TTL cleanup, forbidden exts
- WebSocket reconnect: _AgentRun broadcast pattern, re-attach mid-stream
- UI: favicon, sidebar logo, turn-thinking cards, subagent thinking blocks,
token counter, draft persistence, file progress bar
- Removed AgentNote (content is always None alongside tool_calls)
- Ollama stream_complete: tool_calls captured from non-final chunk (done=False)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
| 2026-04-08 |

Add context compression: rolling summarization when context fills up
...
Mechanism:
- After streaming ends, if context_tokens >= threshold (80% of num_ctx),
compress old turns into a summary message using the same LLM
- Partition: keep system msg + last N turns verbatim (default 6);
everything older goes to the summarizer
- Tool call groups (assistant + tool results) never split across boundary
- Existing summary messages folded into new compression pass — no stack growth
- Summary stored as Message(role=user, is_summary=True) after system msg
- On failure: logged, session left unchanged (non-fatal)
New files:
- navi/core/compressor.py: should_compress, partition_messages,
compress_session (pure logic, testable without agent)
New config (navi/config.py):
- context_compression_enabled: bool = True
- context_compression_threshold: float = 0.80
- context_keep_recent: int = 6
- context_summary_temperature: float = 0.3
New agent event: ContextCompressed(messages_before, messages_after)
Message.is_summary: bool field marks compressed history blocks
Client:
- context_compressed WS event → subtle inline notice in message list
- loadHistory: is_summary messages rendered as collapsible summary cards
- style.css: .summary-card, .compression-notice
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add context token counter: 64k default, live UI display
...
- config: ollama_num_ctx default 8192 → 65536
- LLMChunk: add prompt_tokens / completion_tokens fields
- OllamaBackend.stream: populate token counts from final chunk
(prompt_eval_count + eval_count when chunk.done)
- StreamEnd: add context_tokens and max_context_tokens
- Agent.run_stream: capture token counts, pass to StreamEnd
- websocket: include context_tokens / max_context_tokens in stream_end
- index.html: split chat-header into title span + token-counter span
- sidebar.js: updateChatHeader targets #chat-header-title, not innerHTML
- app.js: updateTokenCounter() shows "X/Y (Z%) tokens", colors:
gray <50%, amber 50–79%, red ≥80%
- style.css: .token-counter, .warn, .danger styles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Server review fixes: profile model routing, sorting, datetime, cleanup
...
- LLMBackend.complete/stream: add model param; OllamaBackend uses it
over self.model, enabling per-profile model selection
- BackendRegistry.get(): remove unused model param
- Agent: pass profile.model to complete() and stream()
- Profiles: correct model to gemma4:e2b-it-q8_0 (was leftover e4b)
- InMemorySessionStore.list_all(): fix sort (pinned+newest first,
was pinned+oldest) — now consistent with SQLite ORDER BY
- session.py, sqlite_session_store.py: datetime.utcnow() →
datetime.now(timezone.utc) (deprecated since Python 3.12)
- _base_options(): accept temperature param, remove dead default
- deps.py: rename _registries → get_registries (public API)
- websocket.py: update import accordingly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add thinking/reasoning streaming support
...
Enable Ollama think param and stream reasoning chunks to client.
New agent events: ThinkingDelta, ThinkingEnd. Config gains ollama_think
and ollama_num_ctx settings. WebSocket protocol updated accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|