| 2026-04-25 |
Fix websocket.py: unpack 4-tuple from get_registries(), pass cp_registry to Agent
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 Apr
|
Add context providers: dynamic system message injection per LLM call
...
- navi/context_providers/ registry + built-in public_url provider (global, always injected)
- context_providers/ user directory, hot-reloaded via reload_tools
- AgentProfile.context_providers field for per-profile opt-in providers
- Agent._collect_context_injections() called before every tool-calling loop
- reload_tools now reloads both user tools and user context providers
- manuals/write_context_provider.md for Navi, docs/context_providers.md reference
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 25 Apr
|
| 2026-04-24 |
Make server I/O non-blocking; update docs
...
- Wrap all heavy filesystem ops in asyncio.to_thread: filesystem tool
(read/write/append/list/find/info/move/delete/query/smart_edit),
image_view (read_bytes), share_file (shutil.copy2), write_tool
(write_text, _register_user_tool), session_files (shutil.rmtree,
iterdir), sessions upload endpoint (sync open/write → to_thread)
- Make delete_session_dir async; update its caller in sessions.py
- docs/config.md: fix wrong defaults (threshold 0.70, keep_recent 8),
remove phantom SESSION_FILES_TTL_HOURS, add LLM timeouts, DATABASE_URL,
PUBLIC_URL, Gmail, CONTEXT_SUMMARY_MAX_TOKENS sections
- docs/profiles.md: add missing tool_developer profile to table
- android-client: add WebView remote debugging; remove unused toolbar menu
- Remove stale helper scripts and test files
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 24 Apr
|
| 2026-04-21 |
WebSocket event replay buffer for disconnect resilience
...
On reconnect to an active agent run the server now replays all events
emitted since the turn started, then switches to live forwarding.
This eliminates the gap where tool cards, thinking blocks and stream
deltas were permanently lost after a network blip.
Server (_AgentRun):
- events: list[dict] buffers every serialised agent event
- broadcast() serialises and appends before putting in subscriber queues
- reconnect flow: subscribe → replay_count snapshot → stream_start →
replay events[0:replay_count] → live _stream_to_client
Client:
- onStreamStart() removes the frozen ghost message instead of marking
done=true, so replay cleanly rebuilds the message from scratch
- replayMode flag suppresses animations during replay
- onReplayStart/onReplayEnd handlers set/clear the flag and restore
animate on the message once live events resume
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 21 Apr
|
| 2026-04-20 |

Planning debug panel, todo auto-populate, scratchpad/persona improvements
...
- Planning debug panel: new Planning tab in debug/index.html shows raw
phase 1/2 outputs and token counts per planning run, stored in
session.planning_logs (new column in both SQLite and PostgreSQL)
- New GET /sessions/{id}/planning API endpoint
- PlanningDebugData internal event wires _run_planning() output into
session storage; never forwarded to WebSocket clients
- Phase 3 (plan critic) disabled — to be reworked with reflect integration
- Todo tool: auto-populated from plan steps after phase 2; model only
needs to call update/view, not set
- Scratchpad: clarified description and persona instructions; removed
context_transfer from user-facing docs (internal mechanism only)
- web_search: switched to ddgs package, SearXNG as primary backend,
DDG html-only fallback; added find_up action to filesystem tool
- Persona: added SCRATCHPAD and TODO sections with clear usage rules;
added NAVI.md project context instructions
- chat.js: fixed subagent planning event fallthrough into parent UI;
statusLabel cleared on first stream delta
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 20 Apr
|
| 2026-04-17 |

Route subagent planning events into spawn_agent card in the UI
...
Previously PlanningStatus/PlanReady had no is_subagent flag, so subagent
planning spinners and plan cards rendered as top-level Navi planning UI.
Backend:
- Add is_subagent field to PlanningStatus and PlanReady events
- _run_planning accepts is_subagent param, passes it through all yields
- run_ephemeral calls _run_planning with is_subagent=True
- websocket.py forwards is_subagent in planning_status and plan_ready messages
Frontend (chat.js):
- onPlanningStatus: if is_subagent, set planningLabel on the last spawn_agent
card instead of msg.statusLabel
- onPlanReady: if is_subagent, push plan into spawn card steps and clear
planningLabel; otherwise behave as before
Frontend (ToolCard.vue):
- Render subagent-planning-indicator (spinner + label) when planningLabel set
- Render plan cards inside subagent steps using the same plan-card pattern
Also includes leftover session changes: spawn_agent default 40 in description
and manual, updated manual content.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 17 Apr
|

Planning phases, context compression, and tool improvements
...
Agent:
- Planning now a 3-phase async generator: Analysis → Execution plan → AIHelper critic
- Yield PlanningStatus events before each phase (UI progress labels)
- Phase 1 runs with think=True for deeper analysis
- Phase 2 includes available tool list so executor assignments are accurate
- Phase 3: independent critic pass validates and corrects TOOL: names against real tool list
- Planning converted from list return to async generator (fixes token accounting)
Backend:
- Context compression threshold: 80% → 70% to trigger earlier
- Compressor summary prompt: structured sections (goal, work state, key facts, outputs, errors)
- Terminal output capped at 5000 chars to prevent context flooding
- Web search: region=wt-wt for DDG, country=ALL for Brave, language=all for SearxNG
- Scratchpad: mandate writing a 'goal' section at start of multi-step tasks
- secretary max_iterations: 40→25, temperature: 0.7→0.5
- server_admin max_iterations: 40→20
Webclient:
- ThinkingCard strips <thought> XML tags leaked by Ollama
- planning_status WS event wired to chat.onPlanningStatus()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 17 Apr
|
Add Prompts and Tools tabs to debug page
...
Backend:
- GET /agents/prompts — returns full built system prompt for every
profile, broken into sections (persona / profile / profiles block)
with char/token counts; mirrors Agent._build_system_prompt() exactly
- GET /agents/tools — now includes parameters schema alongside name
and description
Debug page:
- Tab bar: Context / Prompts / Tools
- Prompts tab: profile sidebar + collapsible sections per prompt part
(persona, profile prompt, profiles block), togglable tools list
- Tools tab: searchable list of all tools with description and
parameter table (name, type, description, required marker)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 17 Apr
|
| 2026-04-16 |
Fix WS disconnect and missed stream on reconnect
...
Two related problems:
- During long AIHelper calls (non-streaming LLM), no data flows to the
WebSocket and browsers drop the connection after ~30-60s of inactivity.
Fixed with a 20s heartbeat: _stream_to_client now uses asyncio.wait_for
and sends {"type":"heartbeat"} on timeout to keep the connection alive.
- After reconnect, if the agent finished while the client was offline,
_runs no longer holds the session and no stream_start is sent. Client
would reconnect silently with no response shown. Fixed by sending
{"type":"session_sync"} on every new WS connection (after reattach
completes or immediately when no run is active) so the client knows
to reload session history.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
Persist context token count: return from API, restore on session load
...
- GET /sessions/{id} now returns context_token_count and max_context_tokens
(max pulled from settings.ollama_num_ctx)
- loadSession() in chat store sets contextTokens/maxContextTokens from the
response so ContextBar shows the last known fill level immediately on load,
not only after the first new message
- Restore v-if guard on ContextBar (hides for brand-new sessions with 0 tokens)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
Add response metrics: elapsed time, tool calls, token count
...
Server:
- Message model: elapsed_seconds, tool_call_count, token_count fields
(display-only, excluded from LLM context via exclude_none)
- StreamEnd event: carries same three fields
- agent.run_stream: tracks turn start time, counts ToolEvent completions,
writes metrics onto the final assistant Message before saving to DB
- WebSocket: forwards metrics in stream_end payload
Client:
- chat.onStreamEnd: attaches elapsed_seconds, tool_call_count, token_count
to the streaming message on completion
- buildMessageList: scans each assistant group for metrics from history
- AssistantMessage: renders .msg-meta-row below the response —
timer icon + Xs · wrench icon + N tools · coins icon + Nk tokens · time
(each item only shown if present; time pushed right via margin-left: auto)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|

Add session name generation via LLM
...
Backend:
- Session model gets name: str | None field
- SQLite migration: ADD COLUMN name TEXT
- PostgreSQL: ADD COLUMN IF NOT EXISTS name TEXT (applied on pool init)
- SessionStore: add set_name() abstract method, implemented in all stores
- navi/core/name_generator.py: LLM worker that reads user messages and
returns a 3–6 word title or None if content isn't substantial yet
- POST /sessions/{id}/generate-name endpoint: fires LLM, saves and
returns name; skips if session already named or has no user messages
- GET /sessions and GET /sessions/{id} now include name field
Client:
- api.generateSessionName(id) — calls the new endpoint
- sessions store: updateName(id, name) mutation
- chat store: after stream_end, _tryGenerateName() runs fire-and-forget;
skips silently if session already has a name or if request fails
- SessionItem already displays session.name (falls back to id prefix)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 Apr
|
| 2026-04-15 |
Migrate storage to PostgreSQL with SQLite fallback; misc fixes
...
- Add PgSessionStore (asyncpg pool) and PgMemoryStore replacing aiosqlite
- Keep SqliteSessionStore + SqliteMemoryStore for zero-dependency quick start
- Selection logic in deps.py: DATABASE_URL set → PG, else → SQLite
- Add asyncpg>=0.29 to dependencies; add DATABASE_URL / DB_PATH to config
- Add RESPONSE HYGIENE rule to persona: never echo tool output or plan state
- Add developer profile user tools: weather, internal_monitor
- Update README: developer profile, DB section, current tool/profile state
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 15 Apr
|
| 2026-04-14 |
Expose compression summary as collapsible debug card in chat UI
...
ContextCompressed event now carries the full summary text produced by the
LLM. Compression notice in chat becomes a <details> element showing
message count (before→after) with the summary expandable on click.
Rendered as markdown via marked.js.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|
Add share_file tool and session-lifetime file storage
...
Session file directories now live until the session is deleted, not
24h TTL. Cleanup loop only removes orphaned dirs (session gone from DB).
New share_file tool: copies any file to the session directory and returns
a clickable download URL. Navi can call this after generating any file
the user will want to keep.
New GET /sessions/{id}/files/{filename} endpoint serves files with
correct Content-Disposition (inline for images/HTML/PDF, attachment
for everything else).
Added PUBLIC_URL config key for building correct download links behind
reverse proxies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|
| 2026-04-11 |
Add planning phase and scratchpad tool for smarter task execution
...
- ScratchpadTool: session-scoped working notepad with named sections
(write/append/read/clear). Lets Navi capture intermediate findings
between tool calls instead of losing track of them.
- Planning phase: when profile.planning_enabled=True, a fast pre-loop
LLM call (think=False, no tools) outlines a numbered plan before
any actions are taken. The plan is injected into session context as
an assistant message so the model naturally continues from it.
- PlanReady event + plan_ready WebSocket message + plan card in UI
(green-tinted, collapsible, mirroring thinking card design).
- secretary and server_admin profiles: planning_enabled=True,
scratchpad added to enabled_tools, system prompts updated with
explicit execution discipline instructions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 Apr
|
Fix WebSocket state corruption preventing messages after first reply
...
Replace concurrent WS reads (_stream_recv + recv_task.cancel()) with
HTTP stop endpoint (POST /sessions/{id}/stop). Cancelling a background
receive_text() task corrupted Starlette's WS state, breaking all
subsequent receives. Now the WS has a single reader at all times.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 Apr
|
| 2026-04-10 |

Add stop button and fix context compression hang
...
Stop generation:
- Client: send button toggles to red ■ during streaming; sends {type:stop} via WS
- Server: _stream_recv concurrently reads incoming messages during streaming using
asyncio.wait — stop signal is handled immediately without polling
- Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks
out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully,
model stays in VRAM. No task.cancel() which would eject the model.
- StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop
via the same shared stop_event inherited through task context
Context compression fix:
- compress_context passes think=False to llm.complete() — no extended reasoning
during summarization which caused GPU hang
- Input truncated to 12k chars before sending to summarizer
- LLMBackend.complete() / OllamaBackend.complete() accept think: bool | None override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
Profile switch: emit WS event so client updates UI immediately
...
ProfileSwitched event emitted by switch_profile tool via current_event_sink.
Client handles profile_switched: updates chat header, profile selector,
and local sessions[] — no page refresh needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|

Major feature batch: visibility, planning, file uploads, streaming
...
- stream_complete(): streaming with tools for all LLM turns — thinking
now streams as ThinkingDelta/ThinkingEnd in real-time during tool-
selection turns, not just on the final response
- todo built-in tool: session-scoped plan manager (set/view/update/clear);
persona + all profiles updated with mandatory planning instructions
- TurnThinking event: sub-agent thinking forwarded to parent sink as a
collapsible block in the spawn_agent card
- File uploads: non-image files uploaded via XHR, shown as badges in
message bubble; SVG treated as regular file (not base64 image)
- session_files: POST /sessions/{id}/files, TTL cleanup, forbidden exts
- WebSocket reconnect: _AgentRun broadcast pattern, re-attach mid-stream
- UI: favicon, sidebar logo, turn-thinking cards, subagent thinking blocks,
token counter, draft persistence, file progress bar
- Removed AgentNote (content is always None alongside tool_calls)
- Ollama stream_complete: tool_calls captured from non-final chunk (done=False)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
| 2026-04-09 |

Live tool visibility: pending cards, sub-agent step log
...
Backend:
- ToolStarted event: emitted before tool execution begins so client
can render a pending card with spinner immediately
- ToolEvent gains is_subagent flag; ToolStarted same
- current_event_sink ContextVar in tools/base.py — run_stream() sets it
to an asyncio.Queue before create_task(); run_ephemeral() reads it and
puts ToolStarted/ToolEvent into the queue as each sub-agent step runs
- run_stream() tool loop: sequential execution via create_task() +
polling drain loop (20ms sleep); yields ToolStarted → sub-agent events
from sink → ToolEvent (completed) for each tool call
- run_ephemeral() rewritten to inline sequential tool execution with
sink emission (replaces _execute_tool_calls gather)
- _run_single_tool() helper extracted for run_stream()
- websocket.py handles tool_started and adds is_subagent to tool_call
Frontend:
- appendPendingToolCard(): creates card with spinner; spawn_agent opens
body immediately to show sub-agent log as it fills
- finalizeToolCard(): fills result, removes spinner, adds toggle; strips
"[Sub-agent result — ...]" reminder prefix from displayed text
- appendSubagentStep() / finalizeSubagentStep(): live step log inside
spawn_agent card — each sub-agent tool call gets a ↳ row
- app.js: tool_started → pending card; tool_call → finalize card;
is_subagent routing to sub-step vs main card; abandonStream() resets
pendingToolCard/pendingSubStep
- CSS: .spinner-inline for card headers; .subagent-log / .subagent-step
for nested step display; .tool-body-open for always-open spawn_agent
body; .tool-card.pending suppresses chevron
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
Add spawn_agent: sub-agent delegation with isolated context
...
- Agent.run_ephemeral() — runs a sub-agent loop without a persistent
session; accepts exclude_tools to block recursion; logs start/complete
- session_store made Optional in Agent.__init__ (None for ephemeral runs)
- SpawnAgentTool (navi/tools/spawn_agent.py): spawns an isolated Agent
for a focused task; resolves profile from parent session via ContextVar;
blocks spawn_agent recursion via exclude_tools=["spawn_agent"]
- build_default_registries() accepts session_store param; registers
SpawnAgentTool after BackendRegistry is built (patches _backend_registry)
- deps.py passes _session_store to build_default_registries
- All profiles: spawn_agent added to enabled_tools, max_iterations 10→30
- persona.txt: DELEGATION section — when/how to use spawn_agent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|

Add long-term user memory system
...
Architecture:
- navi/memory/store.py: MemoryStore backed by SQLite (memory_facts,
memory_summary, session_memory_state tables in navi.db)
- navi/memory/extractor.py: LLM-based fact extraction from sessions +
summary regeneration (triggered after session goes idle >30 min)
- Fact upsert uses UNIQUE(category, key) — same key always overwrites,
no duplicates or stale contradictions
- Keyword search across category + key + value (LIKE-based, no extra deps)
Context injection:
- Memory summary injected as an ephemeral system message on every LLM call
via Agent._with_memory() — never persisted to session.context
Tools (all profiles):
- memory_search(query): keyword search against fact DB; persona instructs
model to call it at session start and before personal-context questions
- memory_forget(key, category?): delete a specific fact on user request
Extraction trigger:
- On new session creation, fire-and-forget background task checks all
sessions idle >30 min with unprocessed messages → runs extraction
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
Add /debug page: inspect LLM context by session ID
...
- GET /sessions/{id}/context — returns session.context (what the model sees)
with message count and total char count
- /debug — standalone HTML page, session ID from URL hash
Shows each context message: role badge, full content, tool calls as JSON,
images as thumbnails, per-message and total char/token estimates
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 9 Apr
|
| 2026-04-08 |
Review fixes: events module, circular imports, deps, vision-aware compression
...
- Extract all AgentEvent dataclasses to navi/core/events.py; import from
there in agent.py and __init__.py — eliminates circular import between
workers and core
- workers/compressor.py: remove runtime import hack, use navi.core.events
- workers/base.py: WorkerResult.events typed as list[AgentEvent] (was Any)
- api/deps.py: replace @lru_cache on mutable list with module-level
singletons (_registries, _workers)
- core/compressor.py: _format_for_summary returns (text, images); images
passed to summarization LLM so vision models describe them in summary;
non-vision models silently ignore the images field; docstring updated
- client/js/app.js: add comment explaining is_summary backward compat branch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Separate display history from LLM context; formalize worker system
...
Architecture change:
- session.messages: full display history, never modified by compression
- session.context: what the LLM sees, may be compressed by workers
- System messages go only into context (not display history)
- Image injections (synthetic) go only into context
- User/assistant/tool messages go into both
SQLite: add context column with backward-compat migration
(empty context → initialized from messages on load)
Workers (navi/workers/):
- Worker ABC + WorkerContext + WorkerResult (base.py)
- CompressionWorker: compresses session.context when above threshold
- build_default_workers() returns [CompressionWorker()]
- Agent accepts workers list, runs them after StreamEnd
- Workers injected via deps.py get_workers() (lru_cached singleton)
- WebSocket agent construction also receives workers
Compressor: compress_context() now takes context[], not messages[]
Config: context_keep_recent 6 → 10
Agent: _run_workers() collects events from all workers and yields them
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Add context compression: rolling summarization when context fills up
...
Mechanism:
- After streaming ends, if context_tokens >= threshold (80% of num_ctx),
compress old turns into a summary message using the same LLM
- Partition: keep system msg + last N turns verbatim (default 6);
everything older goes to the summarizer
- Tool call groups (assistant + tool results) never split across boundary
- Existing summary messages folded into new compression pass — no stack growth
- Summary stored as Message(role=user, is_summary=True) after system msg
- On failure: logged, session left unchanged (non-fatal)
New files:
- navi/core/compressor.py: should_compress, partition_messages,
compress_session (pure logic, testable without agent)
New config (navi/config.py):
- context_compression_enabled: bool = True
- context_compression_threshold: float = 0.80
- context_keep_recent: int = 6
- context_summary_temperature: float = 0.3
New agent event: ContextCompressed(messages_before, messages_after)
Message.is_summary: bool field marks compressed history blocks
Client:
- context_compressed WS event → subtle inline notice in message list
- loadHistory: is_summary messages rendered as collapsible summary cards
- style.css: .summary-card, .compression-notice
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add context token counter: 64k default, live UI display
...
- config: ollama_num_ctx default 8192 → 65536
- LLMChunk: add prompt_tokens / completion_tokens fields
- OllamaBackend.stream: populate token counts from final chunk
(prompt_eval_count + eval_count when chunk.done)
- StreamEnd: add context_tokens and max_context_tokens
- Agent.run_stream: capture token counts, pass to StreamEnd
- websocket: include context_tokens / max_context_tokens in stream_end
- index.html: split chat-header into title span + token-counter span
- sidebar.js: updateChatHeader targets #chat-header-title, not innerHTML
- app.js: updateTokenCounter() shows "X/Y (Z%) tokens", colors:
gray <50%, amber 50–79%, red ≥80%
- style.css: .token-counter, .warn, .danger styles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Server review fixes: profile model routing, sorting, datetime, cleanup
...
- LLMBackend.complete/stream: add model param; OllamaBackend uses it
over self.model, enabling per-profile model selection
- BackendRegistry.get(): remove unused model param
- Agent: pass profile.model to complete() and stream()
- Profiles: correct model to gemma4:e2b-it-q8_0 (was leftover e4b)
- InMemorySessionStore.list_all(): fix sort (pinned+newest first,
was pinned+oldest) — now consistent with SQLite ORDER BY
- session.py, sqlite_session_store.py: datetime.utcnow() →
datetime.now(timezone.utc) (deprecated since Python 3.12)
- _base_options(): accept temperature param, remove dead default
- deps.py: rename _registries → get_registries (public API)
- websocket.py: update import accordingly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|
Add thinking/reasoning streaming support
...
Enable Ollama think param and stream reasoning chunks to client.
New agent events: ThinkingDelta, ThinkingEnd. Config gains ollama_think
and ollama_num_ctx settings. WebSocket protocol updated accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|