root/navi-1

Fork: 0

root / navi-1

History for navi-1 / navi / llm / ollama.py

2026-05-13	c8dffaa Browse files » Defensive image cleanup in Ollama backend to prevent 'unknown format' errors ... - _clean_base64_image() strips data URI prefix and validates non-empty result - _to_ollama_messages() filters out invalid/empty images before sending to Ollama - Prevents 500 errors when models receive malformed or unsupported image data Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 13 May
2026-05-12	ac44c84 Browse files » Remove dead LLMBackend.stream() method ... The method was defined on all backends (Ollama, FallbackOllama, OpenAI) and in the base LLMBackend interface, but was never called by agent.py or messages.py. stream_complete() covers all streaming use cases. - navi/llm/base.py: remove abstract stream() method - navi/llm/ollama.py: remove OllamaBackend.stream() - navi/llm/fallback.py: remove FallbackOllamaBackend.stream() - navi/llm/openai_backend.py: remove OpenAIBackend.stream() - docs/tech_debt_review: mark item 54 as fixed 236 tests passing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 12 May
2026-04-30	dce281e Browse files » Improve content publishing UX Eugene Sukhodolskiy committed on 30 Apr
2026-04-29	30dd183 Browse files » Align Ollama HTTP timeout with LLM timeouts Eugene Sukhodolskiy committed on 29 Apr
2026-04-29	098401a Browse files » Stability fixes batch — tech debt review 2026-04-29 ... Critical: - Concurrent WS run race guard (#1) - Tool task cancellation on generator teardown (#2) - StopAsyncIteration kills fallback chain (#3) - Session loading race with _lastLoadId guard (#4) - ContentCard .match() crash on non-string result (#5) - Image data type guard in buildMessageList (#6) High: - Cap WS replay buffer at 500 events (#7) - Deduplicate memory extraction task with asyncio.Lock (#9) - TTL-based fallback blacklisting (5 min) (#10) - Subagent tool exception isolation (#11) - Inline image size/count validation on WS (#12) - Clean up orphaned file on DB insert failure (#13) - Deep watch streamingMsg for auto-scroll (#14) - WS_SCHEME wss:// support for HTTPS (#15) - Sending guard against duplicate message sends (#16) - Global unhandledrejection listener in API layer (#17) Medium: - Cap planning_logs at 20 entries (#22) - Store cleanup_loop task reference (#23) - BaseException → Exception in _run_with_sentinel (#24) - Propagate SystemExit in agent loop (#25) - Configurable output_reserve_tokens (#26) - Always reloadSession on session_sync (#30) - FIFO queue for confirm dialogs (#31) - Reset body.overflow on ImageLightbox unmount (#32) - try/finally in fallback copy (#33) - _isConnecting guard in WS send() (#34) Low: - Lazy-init deps.py singletons (#36) - Replace __import__ with direct imports (#38) - Preserve token count 0 in ollama.py (#39) - Clear orphaned streamingMsg on reconnect reload (#43) - Escape single quote in UserMessage (#44) - Polyfill-free findLast replacement (#48) - Match <table> tags with attributes in markdown (#49) - Attach copy buttons only when msg.done (#50) - Fix hasMeta falsy-0 bug (#53) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 29 Apr
2026-04-28	c874cbe Browse files » Wire pgvector semantic search into memory system ... - Add vector(768) column + HNSW index to memory_facts - Add LLMBackend.embed() with Ollama + fallback implementation - MemoryStore: cosine-distance search with ILIKE fallback - New memory tool params: source, confidence, expires_days, source_context - Update extractor, sqlite_store, deps wiring - Add pgvector to requirements Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
2026-04-26	b5b11be Browse files » changed llm & new ollama param Eugene Sukhodolskiy committed on 26 Apr
2026-04-24	b1a5f44 Browse files » Set temperature=1.0, top_k=64, top_p=0.95 for all profiles (Google recommended for gemma4) ... Also fixes discuss profile memory tools: use combined "memory" tool name, not nonexistent split variants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 24 Apr
2026-04-24	511dc46 Browse files » Add Ollama multi-server fallback with in-memory blacklisting ... - New FallbackOllamaBackend (navi/llm/fallback.py): tries servers and models in priority order; on LLMConnectionError blacklists the server for the process lifetime, on LLMModelNotFoundError blacklists the (server, model) pair — eliminates latency from repeated failed probes - OllamaBackend now raises typed LLMConnectionError / LLMModelNotFoundError instead of bare LLMBackendError; accepts list[str] \| str \| None for model - AgentProfile.model changed from str to list[str] (str auto-normalised); all profiles updated to ["gemma4:31b-cloud", "gemma4:26b-a4b-it-q4_K_M"] - New config field OLLAMA_BACKENDS_FILE: path to [{host, api_key?}] JSON; when set, registry creates FallbackOllamaBackend instead of OllamaBackend - ollama_backends.json template added (gitignored — contains API key) - current_model ContextVar type widened to list[str] \| str \| None Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 24 Apr
2026-04-22	7eea278 Browse files » Support Ollama Cloud API key Eugene Sukhodolskiy committed on 22 Apr
2026-04-20	9704a92 Browse files » Autonomous reasoning improvements: budget, anchoring, anti-stall, validation ... - AgentProfile: per-profile thinking mechanics flags (think_enabled, iteration_budget_enabled, goal_anchoring, anti_stall, step_validation, planning_reflect, adaptive_replan) — all profiles updated in config.json - Iteration budget: inject remaining iterations into context so model knows when to wrap up; urgency levels at ≤7 and ≤3 remaining - Goal anchoring: inject original goal + todo state every N iterations to prevent drift on long tasks - Anti-stall: two signals — no todo progress for N iterations, or identical tool calls repeated N times; warning injected into context - Todo step validation: marking done requires a validation field describing how result was verified; failed gets a soft nudge with tip for re-planning - stream_complete: add think param to base class, ollama and openai backends - Summarizer: raise max_tokens 1024→3000, expand system prompt with user-preferences section and verbatim-value instructions - Compression card: persist to session.messages (is_compression flag on Message), show expandable summary in webclient with markdown body - ToolResult.to_message_content: always include output on failure so tracebacks and error details reach the model (fixes silent Error: None) - Developer profile: fix subagent profile secretary→developer, add write_tool to subagent_tools, clarify write_tool vs filesystem in system prompt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 20 Apr
2026-04-16	b1dd9ca Browse files » Count AIHelper tokens in session metrics ... Adds prompt/completion token fields to LLMResponse, populated by OllamaBackend.complete(). AIHelper emits AIHelperTokensUsed into the current event sink after each LLM call; run_stream drains it into _subagent_tokens so AIHelper usage is reflected in the turn token delta. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 16 Apr
2026-04-15	4b64763 Browse files » Add explicit output token budget for summarizer (context_summary_max_tokens) ... Previously there was no num_predict set for the summarization LLM call, so Ollama used its server default (often 128 tokens — very short summaries). - Add max_tokens param to LLMBackend.complete() and OllamaBackend (→ num_predict) - Add context_summary_max_tokens: int = 1024 to config - Thread it through compress_context() and CompressionWorker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 15 Apr
2026-04-10	86402e0 Browse files » Add stop button and fix context compression hang ... Stop generation: - Client: send button toggles to red ■ during streaming; sends {type:stop} via WS - Server: _stream_recv concurrently reads incoming messages during streaming using asyncio.wait — stop signal is handled immediately without polling - Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully, model stays in VRAM. No task.cancel() which would eject the model. - StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop via the same shared stop_event inherited through task context Context compression fix: - compress_context passes think=False to llm.complete() — no extended reasoning during summarization which caused GPU hang - Input truncated to 12k chars before sending to summarizer - LLMBackend.complete() / OllamaBackend.complete() accept think: bool \| None override Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 10 Apr
2026-04-10	1e8b65e Browse files » Major feature batch: visibility, planning, file uploads, streaming ... - stream_complete(): streaming with tools for all LLM turns — thinking now streams as ThinkingDelta/ThinkingEnd in real-time during tool- selection turns, not just on the final response - todo built-in tool: session-scoped plan manager (set/view/update/clear); persona + all profiles updated with mandatory planning instructions - TurnThinking event: sub-agent thinking forwarded to parent sink as a collapsible block in the spawn_agent card - File uploads: non-image files uploaded via XHR, shown as badges in message bubble; SVG treated as regular file (not base64 image) - session_files: POST /sessions/{id}/files, TTL cleanup, forbidden exts - WebSocket reconnect: _AgentRun broadcast pattern, re-attach mid-stream - UI: favicon, sidebar logo, turn-thinking cards, subagent thinking blocks, token counter, draft persistence, file progress bar - Removed AgentNote (content is always None alongside tool_calls) - Ollama stream_complete: tool_calls captured from non-final chunk (done=False) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 10 Apr
2026-04-08	9c0c6b3 Browse files » Add context token counter: 64k default, live UI display ... - config: ollama_num_ctx default 8192 → 65536 - LLMChunk: add prompt_tokens / completion_tokens fields - OllamaBackend.stream: populate token counts from final chunk (prompt_eval_count + eval_count when chunk.done) - StreamEnd: add context_tokens and max_context_tokens - Agent.run_stream: capture token counts, pass to StreamEnd - websocket: include context_tokens / max_context_tokens in stream_end - index.html: split chat-header into title span + token-counter span - sidebar.js: updateChatHeader targets #chat-header-title, not innerHTML - app.js: updateTokenCounter() shows "X/Y (Z%) tokens", colors: gray <50%, amber 50–79%, red ≥80% - style.css: .token-counter, .warn, .danger styles Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 8 Apr
	f5f8d90 Browse files » Server review fixes: profile model routing, sorting, datetime, cleanup ... - LLMBackend.complete/stream: add model param; OllamaBackend uses it over self.model, enabling per-profile model selection - BackendRegistry.get(): remove unused model param - Agent: pass profile.model to complete() and stream() - Profiles: correct model to gemma4:e2b-it-q8_0 (was leftover e4b) - InMemorySessionStore.list_all(): fix sort (pinned+newest first, was pinned+oldest) — now consistent with SQLite ORDER BY - session.py, sqlite_session_store.py: datetime.utcnow() → datetime.now(timezone.utc) (deprecated since Python 3.12) - _base_options(): accept temperature param, remove dead default - deps.py: rename _registries → get_registries (public API) - websocket.py: update import accordingly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 8 Apr
	c9ee0ec Browse files » Add thinking/reasoning streaming support ... Enable Ollama think param and stream reasoning chunks to client. New agent events: ThinkingDelta, ThinkingEnd. Config gains ollama_think and ollama_num_ctx settings. WebSocket protocol updated accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 8 Apr
	9a8056d Browse files » Add multimodal image support and client UX improvements ... Server: - Add ImageViewTool (load image from file/URL, returns base64) - Add images field to Message model with created_at timestamp - Agent run/run_stream accept images param; inject image messages after image_view tool calls - WebSocket handler accepts images array from client, strips data URI prefix - All profiles include image_view tool - Fix tool call serialization (model_dump mode=json for datetime) - Add no-store cache headers for static files Client: - Image attachment: file picker button + clipboard paste + preview strip with remove - Images rendered in chat bubbles; loaded from history - Tool cards rebuilt as div+CSS toggle (fixes details/overflow-hidden collapse bug) - Tool cards appear before response bubble (lazy bubble creation on first stream_delta) - Typing indicator persists through tool calls, removed only when text starts streaming - Tool cards restored from history on page reload - Message timestamps stored via created_at field, shown correctly in history - Session ID reflected in URL hash for bookmarking; restored on page load - Remove localStorage session tracking (server last_active used instead) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 8 Apr
	41cdab1 Browse files » Initial implementation of the agent system core ... - FastAPI server with REST API and WebSocket streaming - Modular LLM backend abstraction (Ollama implemented, OpenAI stub) - Tool system: web_search (ddgs), filesystem, http_request, code_exec, terminal - Agent profiles: smart_home, server_admin, secretary - Tool-calling loop with concurrent tool execution - In-memory session store with SessionStore ABC for future persistence - Registry pattern for tools, profiles, and backends - Orchestrator stub as foundation for multi-agent scenarios Eugene Sukhodolskiy committed on 8 Apr