root/navi-1

Fork: 0

root / navi-1

History for navi-1 / navi / llm

2026-07-10	2c85e90 Browse files » tui: show the currently-served model in the status panel ... The status panel's Model line was fed the global ollama_default_model, not the session/profile model, and the server never told the client which model actually served a call. Now: - Backends stamp the resolved model onto LLMChunk (first chunk) / LLMResponse. The fallback backend reports the model that survived its server+model priority list (may differ from the profile's first choice). - New ModelInfo event ({"type":"model_info","model":...}) emitted once per turn from agent._consume_stream, re-emitted only when the model changes across iterations. Additive WS event — old clients ignore it. - TUI: attach_session/switch fetch the profile's configured model (first of profile.model) via api.get_profile_model so the panel shows a value before the first request; model_info then refines it to the actually-served model. Not forwarded to the chat panel. raw CLI prints "[model] ...". Co-Authored-By: Claude <noreply@anthropic.com> Eugene Sukhodolskiy committed 5 days ago
2026-06-26	4519268 Browse files » compressor: structured summaries, profile-aware compression, adaptive keep_recent ... - Replace free-form summary with strict Markdown template (Goal, Active Files, Decisions, Completed Work, Pending Work/Todo, Errors, Key Values). - Keep filesystem/code_exec/terminal tool results and messages with is_compression_critical=True verbatim during compression instead of 300-char truncation. - Make compression profile-aware: AgentProfile gains compression_keep_recent, compression_max_tokens, compression_prompt_file. navi_code uses dedicated compression prompt and larger keep_recent/max_tokens. - Adaptive partition_messages(): important turns (user corrections, errors, critical tools) survive longer; filler/social turns compress sooner. - Increase default context_summary_max_tokens from 3000 to 4000. - Propagate active profile changes to ContextCompressor and SubAgentRunner. Co-Authored-By: Claude <noreply@anthropic.com> Eugene Sukhodolskiy committed 19 days ago
2026-05-25	c7c0479 Browse files » Add session message archive table + global sequence_number tracking ... Schema: - session_messages_archive — identical structure, stores old messages. - sessions.next_sequence — monotonic seq counter per session. - sessions.archive_threshold — split point between hot and archive. Behaviour: - get() / _build_sessions() load only seq >= archive_threshold (hot). - save() UPDATEs existing rows (seq >= 0) and INSERTs new ones (seq = -1) with auto-assigned sequence_number = next_sequence, next_sequence+1, ... - archive_old_messages() moves a batch to archive and bumps the threshold. This keeps the hot table bounded so list/get RAM stays flat regardless of total session history size. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 25 May
2026-05-25	2d4109a Browse files » Phase 2: Dual-write with is_context/is_display flags on Message ... - Message model gets is_context and is_display bools - PgSessionStore.save() writes flags directly to session_messages - Agent sets is_context=False on display-only messages, is_display=False on context-only - Planning: plan context msg is_display=False, plan marker is_context=False - Compression: summarized messages get is_context=False, summary added to messages with is_display=False - Tests updated for extra user display+context messages per turn Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 25 May
2026-05-24	4f78099 Browse files » Raise first-chunk timeout to 90s and retry same server+model before fallback ... - config.py: llm_stream_first_chunk_timeout 180s → 90s - fallback.py stream_complete: wrap gen.__anext__() in asyncio.wait_for() with llm_stream_first_chunk_timeout; on TimeoutError or LLMConnectionError sleep 2s and retry once on the same server+model before blacklisting/fallback Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 24 May
2026-05-21	d4e2722 Browse files » Add structured logging for Ollama chat errors ... Log model, message count, tools count, and raw error string whenever self._client.chat() raises an exception. This makes it possible to reconstruct the exact request payload that triggered a 500 from Ollama Cloud — critical for diagnosing transient vs systemic failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 21 May
2026-05-21	b8acc87 Browse files » FallbackOllamaBackend: do not blacklist single server, empty file fallback ... - When only one Ollama server is configured, LLMConnectionError no longer adds it to the dead-server blacklist. This fixes the bug where a transient failure permanently blocked all requests until server restart. - LLMModelNotFoundError on a single server is also not blacklisted. - _discover_backends now falls back to settings.ollama_host when the ollama_backends_file is empty, missing, or returns no valid servers. - Added unit tests covering single-server no-blacklist, multi-server blacklist, model-not-found no-blacklist, and empty-file fallback. 400 passed, 1 skipped Eugene Sukhodolskiy committed on 21 May
2026-05-15	e4984fa Browse files » fix(recall): stabilize scheduled callback system and improve UX ... Backend fixes: - stop_session now stops headless recall runs via _busy_sessions dict - _fire_recall sets user ContextVars so tools work correctly - MaxIterationsReached treated as success, not failure - skip_next_recall uses GREATEST(trigger_at, now) for overdue recalls - schedule_recall rejects past trigger times - timezone offset double-adjustment fixed for aware datetimes - _fire_recall registers _AgentRun for reconnect/replay support - session_sync race with stream_start fixed Frontend improvements: - Recall banner moved to ChatHeader with live Cancel/Skip buttons - Recall messages styled with is_recall flag and badge - Real-time recall updates via WebSocket (recall_update events) - Recall filter moved to sessions-header as toggle button - Session list shows clock icon for sessions with pending recall - Empty state messages for empty/filtered session lists - Fixed missing api import in ChatHeader.vue Tests: - Updated scheduler_loop tests for _busy_sessions dict change Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 15 May
2026-05-13	1f6f538 Browse files » Persist uploaded files in messages, live file tree updates, and UI polish ... Backend: - Add `files` field to `Message` model so uploaded file metadata survives page refresh - Pass `files` through websocket handler → `agent.run_stream` / `agent.run` - `list_tools`: make `profile_id` required; return error instead of all-tools fallback Webclient: - Call `fetchFiles()` after successful file upload for immediate Files tab update - Live refresh file tree on filesystem (write/edit/append/mkdir/rm/cp/mv), terminal, and code_exec tool calls - Add manual refresh button (desktop) and pull-to-refresh (mobile) to Files tab - Fix live link updates: move regex creation inside per-message loop to avoid lastIndex state leak - Restore full profile name text next to avatar in ChatHeader; hide avatar in header - Fix mobile ArtifactsPanel: collapse tab text labels so close button fits with 3 tabs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 13 May
2026-05-13	c8dffaa Browse files » Defensive image cleanup in Ollama backend to prevent 'unknown format' errors ... - _clean_base64_image() strips data URI prefix and validates non-empty result - _to_ollama_messages() filters out invalid/empty images before sending to Ollama - Prevents 500 errors when models receive malformed or unsupported image data Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 13 May
2026-05-12	ac44c84 Browse files » Remove dead LLMBackend.stream() method ... The method was defined on all backends (Ollama, FallbackOllama, OpenAI) and in the base LLMBackend interface, but was never called by agent.py or messages.py. stream_complete() covers all streaming use cases. - navi/llm/base.py: remove abstract stream() method - navi/llm/ollama.py: remove OllamaBackend.stream() - navi/llm/fallback.py: remove FallbackOllamaBackend.stream() - navi/llm/openai_backend.py: remove OpenAIBackend.stream() - docs/tech_debt_review: mark item 54 as fixed 236 tests passing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 12 May
2026-05-11	cebc073 Browse files » Fix ollama_backends / FallbackOllamaBackend issues ... - registry.py: always use FallbackOllamaBackend (unified backend). Enables model priority lists in all deployments, not just multi-server. - agent.py: add missing think=profile.think_enabled to run() (REST endpoint). - compressor.py: fix model param type (str → list[str] \| str \| None). - fallback.py: harden load_servers_from_file against missing/bad JSON files and entries without host. Add clear_blacklists() for manual reset. - admin.py: add POST /admin/ollama/clear-blacklists endpoint. - tech_debt_review: document dead stream() methods. - tests: add tests for single-server fallback, bad file handling, missing host skipping, and blacklist clearing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 11 May
2026-04-30	dce281e Browse files » Improve content publishing UX Eugene Sukhodolskiy committed on 30 Apr
2026-04-29	30dd183 Browse files » Align Ollama HTTP timeout with LLM timeouts Eugene Sukhodolskiy committed on 29 Apr
2026-04-29	098401a Browse files » Stability fixes batch — tech debt review 2026-04-29 ... Critical: - Concurrent WS run race guard (#1) - Tool task cancellation on generator teardown (#2) - StopAsyncIteration kills fallback chain (#3) - Session loading race with _lastLoadId guard (#4) - ContentCard .match() crash on non-string result (#5) - Image data type guard in buildMessageList (#6) High: - Cap WS replay buffer at 500 events (#7) - Deduplicate memory extraction task with asyncio.Lock (#9) - TTL-based fallback blacklisting (5 min) (#10) - Subagent tool exception isolation (#11) - Inline image size/count validation on WS (#12) - Clean up orphaned file on DB insert failure (#13) - Deep watch streamingMsg for auto-scroll (#14) - WS_SCHEME wss:// support for HTTPS (#15) - Sending guard against duplicate message sends (#16) - Global unhandledrejection listener in API layer (#17) Medium: - Cap planning_logs at 20 entries (#22) - Store cleanup_loop task reference (#23) - BaseException → Exception in _run_with_sentinel (#24) - Propagate SystemExit in agent loop (#25) - Configurable output_reserve_tokens (#26) - Always reloadSession on session_sync (#30) - FIFO queue for confirm dialogs (#31) - Reset body.overflow on ImageLightbox unmount (#32) - try/finally in fallback copy (#33) - _isConnecting guard in WS send() (#34) Low: - Lazy-init deps.py singletons (#36) - Replace __import__ with direct imports (#38) - Preserve token count 0 in ollama.py (#39) - Clear orphaned streamingMsg on reconnect reload (#43) - Escape single quote in UserMessage (#44) - Polyfill-free findLast replacement (#48) - Match <table> tags with attributes in markdown (#49) - Attach copy buttons only when msg.done (#50) - Fix hasMeta falsy-0 bug (#53) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 29 Apr
2026-04-28	c874cbe Browse files » Wire pgvector semantic search into memory system ... - Add vector(768) column + HNSW index to memory_facts - Add LLMBackend.embed() with Ollama + fallback implementation - MemoryStore: cosine-distance search with ILIKE fallback - New memory tool params: source, confidence, expires_days, source_context - Update extractor, sqlite_store, deps wiring - Add pgvector to requirements Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
2026-04-26	b5b11be Browse files » changed llm & new ollama param Eugene Sukhodolskiy committed on 26 Apr
2026-04-25	52b4069 Browse files » Tune profile sampling configs Eugene Sukhodolskiy committed on 25 Apr
2026-04-24	b1a5f44 Browse files » Set temperature=1.0, top_k=64, top_p=0.95 for all profiles (Google recommended for gemma4) ... Also fixes discuss profile memory tools: use combined "memory" tool name, not nonexistent split variants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 24 Apr
2026-04-24	511dc46 Browse files » Add Ollama multi-server fallback with in-memory blacklisting ... - New FallbackOllamaBackend (navi/llm/fallback.py): tries servers and models in priority order; on LLMConnectionError blacklists the server for the process lifetime, on LLMModelNotFoundError blacklists the (server, model) pair — eliminates latency from repeated failed probes - OllamaBackend now raises typed LLMConnectionError / LLMModelNotFoundError instead of bare LLMBackendError; accepts list[str] \| str \| None for model - AgentProfile.model changed from str to list[str] (str auto-normalised); all profiles updated to ["gemma4:31b-cloud", "gemma4:26b-a4b-it-q4_K_M"] - New config field OLLAMA_BACKENDS_FILE: path to [{host, api_key?}] JSON; when set, registry creates FallbackOllamaBackend instead of OllamaBackend - ollama_backends.json template added (gitignored — contains API key) - current_model ContextVar type widened to list[str] \| str \| None Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 24 Apr
2026-04-22	7eea278 Browse files » Support Ollama Cloud API key Eugene Sukhodolskiy committed on 22 Apr
2026-04-20	9704a92 Browse files » Autonomous reasoning improvements: budget, anchoring, anti-stall, validation ... - AgentProfile: per-profile thinking mechanics flags (think_enabled, iteration_budget_enabled, goal_anchoring, anti_stall, step_validation, planning_reflect, adaptive_replan) — all profiles updated in config.json - Iteration budget: inject remaining iterations into context so model knows when to wrap up; urgency levels at ≤7 and ≤3 remaining - Goal anchoring: inject original goal + todo state every N iterations to prevent drift on long tasks - Anti-stall: two signals — no todo progress for N iterations, or identical tool calls repeated N times; warning injected into context - Todo step validation: marking done requires a validation field describing how result was verified; failed gets a soft nudge with tip for re-planning - stream_complete: add think param to base class, ollama and openai backends - Summarizer: raise max_tokens 1024→3000, expand system prompt with user-preferences section and verbatim-value instructions - Compression card: persist to session.messages (is_compression flag on Message), show expandable summary in webclient with markdown body - ToolResult.to_message_content: always include output on failure so tracebacks and error details reach the model (fixes silent Error: None) - Developer profile: fix subagent profile secretary→developer, add write_tool to subagent_tools, clarify write_tool vs filesystem in system prompt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 20 Apr
2026-04-16	b1dd9ca Browse files » Count AIHelper tokens in session metrics ... Adds prompt/completion token fields to LLMResponse, populated by OllamaBackend.complete(). AIHelper emits AIHelperTokensUsed into the current event sink after each LLM call; run_stream drains it into _subagent_tokens so AIHelper usage is reflected in the turn token delta. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 16 Apr
	a338f8b Browse files » Add response metrics: elapsed time, tool calls, token count ... Server: - Message model: elapsed_seconds, tool_call_count, token_count fields (display-only, excluded from LLM context via exclude_none) - StreamEnd event: carries same three fields - agent.run_stream: tracks turn start time, counts ToolEvent completions, writes metrics onto the final assistant Message before saving to DB - WebSocket: forwards metrics in stream_end payload Client: - chat.onStreamEnd: attaches elapsed_seconds, tool_call_count, token_count to the streaming message on completion - buildMessageList: scans each assistant group for metrics from history - AssistantMessage: renders .msg-meta-row below the response — timer icon + Xs · wrench icon + N tools · coins icon + Nk tokens · time (each item only shown if present; time pushed right via margin-left: auto) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 16 Apr
	ea5766e Browse files » Persist thinking and plan cards across session reloads ... - Message: add thinking and is_plan fields (display-only, not sent to LLM) - Agent main loop: accumulate thinking per iteration, save with assistant message - _run_planning: also append plan to session.messages with is_plan=True so UI can render plan cards after page reload (context already had the plan) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 16 Apr
2026-04-15	4b64763 Browse files » Add explicit output token budget for summarizer (context_summary_max_tokens) ... Previously there was no num_predict set for the summarization LLM call, so Ollama used its server default (often 128 tokens — very short summaries). - Add max_tokens param to LLMBackend.complete() and OllamaBackend (→ num_predict) - Add context_summary_max_tokens: int = 1024 to config - Thread it through compress_context() and CompressionWorker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 15 Apr
2026-04-10	86402e0 Browse files » Add stop button and fix context compression hang ... Stop generation: - Client: send button toggles to red ■ during streaming; sends {type:stop} via WS - Server: _stream_recv concurrently reads incoming messages during streaming using asyncio.wait — stop signal is handled immediately without polling - Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully, model stays in VRAM. No task.cancel() which would eject the model. - StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop via the same shared stop_event inherited through task context Context compression fix: - compress_context passes think=False to llm.complete() — no extended reasoning during summarization which caused GPU hang - Input truncated to 12k chars before sending to summarizer - LLMBackend.complete() / OllamaBackend.complete() accept think: bool \| None override Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 10 Apr
2026-04-10	1e8b65e Browse files » Major feature batch: visibility, planning, file uploads, streaming ... - stream_complete(): streaming with tools for all LLM turns — thinking now streams as ThinkingDelta/ThinkingEnd in real-time during tool- selection turns, not just on the final response - todo built-in tool: session-scoped plan manager (set/view/update/clear); persona + all profiles updated with mandatory planning instructions - TurnThinking event: sub-agent thinking forwarded to parent sink as a collapsible block in the spawn_agent card - File uploads: non-image files uploaded via XHR, shown as badges in message bubble; SVG treated as regular file (not base64 image) - session_files: POST /sessions/{id}/files, TTL cleanup, forbidden exts - WebSocket reconnect: _AgentRun broadcast pattern, re-attach mid-stream - UI: favicon, sidebar logo, turn-thinking cards, subagent thinking blocks, token counter, draft persistence, file progress bar - Removed AgentNote (content is always None alongside tool_calls) - Ollama stream_complete: tool_calls captured from non-final chunk (done=False) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 10 Apr
2026-04-08	802c186 Browse files » Add context compression: rolling summarization when context fills up ... Mechanism: - After streaming ends, if context_tokens >= threshold (80% of num_ctx), compress old turns into a summary message using the same LLM - Partition: keep system msg + last N turns verbatim (default 6); everything older goes to the summarizer - Tool call groups (assistant + tool results) never split across boundary - Existing summary messages folded into new compression pass — no stack growth - Summary stored as Message(role=user, is_summary=True) after system msg - On failure: logged, session left unchanged (non-fatal) New files: - navi/core/compressor.py: should_compress, partition_messages, compress_session (pure logic, testable without agent) New config (navi/config.py): - context_compression_enabled: bool = True - context_compression_threshold: float = 0.80 - context_keep_recent: int = 6 - context_summary_temperature: float = 0.3 New agent event: ContextCompressed(messages_before, messages_after) Message.is_summary: bool field marks compressed history blocks Client: - context_compressed WS event → subtle inline notice in message list - loadHistory: is_summary messages rendered as collapsible summary cards - style.css: .summary-card, .compression-notice Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 8 Apr
2026-04-08	9c0c6b3 Browse files » Add context token counter: 64k default, live UI display ... - config: ollama_num_ctx default 8192 → 65536 - LLMChunk: add prompt_tokens / completion_tokens fields - OllamaBackend.stream: populate token counts from final chunk (prompt_eval_count + eval_count when chunk.done) - StreamEnd: add context_tokens and max_context_tokens - Agent.run_stream: capture token counts, pass to StreamEnd - websocket: include context_tokens / max_context_tokens in stream_end - index.html: split chat-header into title span + token-counter span - sidebar.js: updateChatHeader targets #chat-header-title, not innerHTML - app.js: updateTokenCounter() shows "X/Y (Z%) tokens", colors: gray <50%, amber 50–79%, red ≥80% - style.css: .token-counter, .warn, .danger styles Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 8 Apr