root/navi-1

Fork: 0

root / navi-1

History for navi-1 / navi / workers

2026-07-12	1b91fcf Browse files » compression: profile-aware worker + real-token baseline estimator ... Item 2 — thread the active profile into CompressionWorker so compress_context applies per-profile overrides (compression_keep_recent, compression_max_tokens, compression_prompt_file). navi_code now compresses with keep_recent=12 instead of the global 8. Item 1 — estimate the next LLM call's context from the real prompt_tokens of the previous call (bulk) plus a heuristic delta for messages appended since, replacing the chars//3 estimate that undercounts code-heavy tool output and fired midturn compression too late (Navi kept working until the window was exhausted). Baseline is recorded after each stream and cleared after compression; check_context_size and the midturn gate use it, with a heuristic fallback when no baseline exists or the context shrank. Co-Authored-By: Claude <noreply@anthropic.com> Eugene Sukhodolskiy committed 3 days ago
2026-07-12	3d9c3da Browse files » compression: fix auto-compress no-op on few-huge-messages + honest status ... Root cause: the compression gate (should_compress) measures tokens, but the partition measures message/turn count, and CompressionStarted was emitted before the attempt. For navi_code's "few very large messages" shape (one big file read = 1 user + assistant + 1 huge tool result, 66k tokens in 3 messages) the gate fired, the UI showed "compression", but partition returned to_summarize=[] -> compress_context None -> nothing shrank. The agent kept going until the window overflowed. It wasn't running "during" compression — there was no compression, just a no-op the user mistook for one. A. Per-message head/tail truncation in context_builder.build(): oversized tool/assistant messages (over context_message_token_budget, 0=num_ctx//6) are capped head+marker+tail in the LLM view only (model_copy — stored history and reloads are never affected). A single huge tool result can no longer alone blow the window; user/system messages are never truncated. B. Token-budget hard-truncate fallback in compress_session: when partition no-ops but tokens exceed the threshold, drop oldest turns to num_ctx0.5. _hard_truncate is now token-aware (was a fixed message-count floor that no-oped on <=6 messages even when huge). New would_compress() predicts compress_session's real outcome with no LLM call. C. Honest CompressionStarted: _compression_events_midturn/_preturn emit it only after would_compress() confirms the partition (or token-budget fallback) can actually shrink the stored context — no more "compression" status with no ContextCompressed to follow. Bonus: post-turn CompressionWorker now passes keep_recent_messages= max(12, context_keep_recent2), matching the midturn path, so a single long autonomous turn compresses post-turn too (was always a no-op). Tests (+14): would_compress agreement, token-budget fallback, token-aware hard_truncate, build() truncation (preserves user, no mutation, head+tail), agent no-CompressionStarted-when-nothing-to-compress, worker single-long-turn. Co-Authored-By: Claude <noreply@anthropic.com> Eugene Sukhodolskiy committed 3 days ago
2026-06-01	b7697c8 Browse files » Fix CompressionWorker: sync context compression with session messages ... - Mark removed messages as is_context=False so DB reload doesn't resurrect them - Persist summary_msg in messages (is_display=False) so summary survives reload - Add is_compression marker with is_context=False - Set context_token_count to real estimate instead of zero Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 1 Jun
2026-05-02	8b3ebca Browse files » Refine 3D modeler workflow Eugene Sukhodolskiy committed on 2 May
2026-04-29	8f68841 Browse files » Architecture extensibility — event bus, middleware, auto-discovery, Pydantic profiles ... - EventBus: async pub/sub for AgentEvents, WebSocket subscribes instead of direct yield - Declarative serialization: AgentEvent.to_wire() on all event types - Auto-discovery for LLM backends (_discover_backends) and workers (scan navi/workers/*.py) - AgentProfile: Pydantic BaseModel with extra='allow', @field_validator for model coercion - Tool middleware chain: pre/post execute hooks via ToolRegistry.add_middleware() - LoggingMiddleware: built-in, logs every tool call - Fix pg_trgm DDL: conditional GIN indexes via DO $$ block, no CREATE EXTENSION - New files: event_bus.py, middleware.py, logging_middleware.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 29 Apr
2026-04-20	9704a92 Browse files » Autonomous reasoning improvements: budget, anchoring, anti-stall, validation ... - AgentProfile: per-profile thinking mechanics flags (think_enabled, iteration_budget_enabled, goal_anchoring, anti_stall, step_validation, planning_reflect, adaptive_replan) — all profiles updated in config.json - Iteration budget: inject remaining iterations into context so model knows when to wrap up; urgency levels at ≤7 and ≤3 remaining - Goal anchoring: inject original goal + todo state every N iterations to prevent drift on long tasks - Anti-stall: two signals — no todo progress for N iterations, or identical tool calls repeated N times; warning injected into context - Todo step validation: marking done requires a validation field describing how result was verified; failed gets a soft nudge with tip for re-planning - stream_complete: add think param to base class, ollama and openai backends - Summarizer: raise max_tokens 1024→3000, expand system prompt with user-preferences section and verbatim-value instructions - Compression card: persist to session.messages (is_compression flag on Message), show expandable summary in webclient with markdown body - ToolResult.to_message_content: always include output on failure so tracebacks and error details reach the model (fixes silent Error: None) - Developer profile: fix subagent profile secretary→developer, add write_tool to subagent_tools, clarify write_tool vs filesystem in system prompt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 20 Apr
2026-04-15	4b64763 Browse files » Add explicit output token budget for summarizer (context_summary_max_tokens) ... Previously there was no num_predict set for the summarization LLM call, so Ollama used its server default (often 128 tokens — very short summaries). - Add max_tokens param to LLMBackend.complete() and OllamaBackend (→ num_predict) - Add context_summary_max_tokens: int = 1024 to config - Thread it through compress_context() and CompressionWorker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 15 Apr
2026-04-14	bbc93b5 Browse files » Expose compression summary as collapsible debug card in chat UI ... ContextCompressed event now carries the full summary text produced by the LLM. Compression notice in chat becomes a <details> element showing message count (before→after) with the summary expandable on click. Rendered as markdown via marked.js. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 14 Apr
2026-04-10	1e8b65e Browse files » Major feature batch: visibility, planning, file uploads, streaming ... - stream_complete(): streaming with tools for all LLM turns — thinking now streams as ThinkingDelta/ThinkingEnd in real-time during tool- selection turns, not just on the final response - todo built-in tool: session-scoped plan manager (set/view/update/clear); persona + all profiles updated with mandatory planning instructions - TurnThinking event: sub-agent thinking forwarded to parent sink as a collapsible block in the spawn_agent card - File uploads: non-image files uploaded via XHR, shown as badges in message bubble; SVG treated as regular file (not base64 image) - session_files: POST /sessions/{id}/files, TTL cleanup, forbidden exts - WebSocket reconnect: _AgentRun broadcast pattern, re-attach mid-stream - UI: favicon, sidebar logo, turn-thinking cards, subagent thinking blocks, token counter, draft persistence, file progress bar - Removed AgentNote (content is always None alongside tool_calls) - Ollama stream_complete: tool_calls captured from non-final chunk (done=False) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 10 Apr
2026-04-08	bdd3786 Browse files » Review fixes: events module, circular imports, deps, vision-aware compression ... - Extract all AgentEvent dataclasses to navi/core/events.py; import from there in agent.py and __init__.py — eliminates circular import between workers and core - workers/compressor.py: remove runtime import hack, use navi.core.events - workers/base.py: WorkerResult.events typed as list[AgentEvent] (was Any) - api/deps.py: replace @lru_cache on mutable list with module-level singletons (_registries, _workers) - core/compressor.py: _format_for_summary returns (text, images); images passed to summarization LLM so vision models describe them in summary; non-vision models silently ignore the images field; docstring updated - client/js/app.js: add comment explaining is_summary backward compat branch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 8 Apr
2026-04-08	261459a Browse files » Separate display history from LLM context; formalize worker system ... Architecture change: - session.messages: full display history, never modified by compression - session.context: what the LLM sees, may be compressed by workers - System messages go only into context (not display history) - Image injections (synthetic) go only into context - User/assistant/tool messages go into both SQLite: add context column with backward-compat migration (empty context → initialized from messages on load) Workers (navi/workers/): - Worker ABC + WorkerContext + WorkerResult (base.py) - CompressionWorker: compresses session.context when above threshold - build_default_workers() returns [CompressionWorker()] - Agent accepts workers list, runs them after StreamEnd - Workers injected via deps.py get_workers() (lru_cached singleton) - WebSocket agent construction also receives workers Compressor: compress_context() now takes context[], not messages[] Config: context_keep_recent 6 → 10 Agent: _run_workers() collects events from all workers and yields them Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 8 Apr