| 2026-05-16 |
Extract ContextCompressor, fix STL viewer, expand test suite, add architecture audit docs
...
- Extract ContextCompressor from agent.py (Step 1 of god-object refactor)
- Add retry + hard-truncate fallback logic to ContextCompressor
- Add unit tests: agent loop (14), compressor (18), KV store (8),
auth encrypt (3), auth deps (13), todo/scratchpad/image_view/memory
- Fix WebGL STL viewer: allow-same-origin sandbox + graceful fallback
- Add CompressionStarted event and client-side compression notice
- Add docs/architecture_weak_spots.md and plan_01_god_object_agent.md
- Update test_events.py and test_agent_context_size.py for new logic
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 16 May
|
| 2026-05-11 |
Fix ollama_backends / FallbackOllamaBackend issues
...
- registry.py: always use FallbackOllamaBackend (unified backend).
Enables model priority lists in all deployments, not just multi-server.
- agent.py: add missing think=profile.think_enabled to run() (REST endpoint).
- compressor.py: fix model param type (str → list[str] | str | None).
- fallback.py: harden load_servers_from_file against missing/bad JSON files
and entries without host. Add clear_blacklists() for manual reset.
- admin.py: add POST /admin/ollama/clear-blacklists endpoint.
- tech_debt_review: document dead stream() methods.
- tests: add tests for single-server fallback, bad file handling,
missing host skipping, and blacklist clearing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 11 May
|
| 2026-05-02 |
Refine 3D modeler workflow
Eugene Sukhodolskiy
committed
on 2 May
|
| 2026-04-25 |
Improve compression and memory prompts
Eugene Sukhodolskiy
committed
on 25 Apr
|
| 2026-04-20 |

Autonomous reasoning improvements: budget, anchoring, anti-stall, validation
...
- AgentProfile: per-profile thinking mechanics flags (think_enabled,
iteration_budget_enabled, goal_anchoring, anti_stall, step_validation,
planning_reflect, adaptive_replan) — all profiles updated in config.json
- Iteration budget: inject remaining iterations into context so model knows
when to wrap up; urgency levels at ≤7 and ≤3 remaining
- Goal anchoring: inject original goal + todo state every N iterations to
prevent drift on long tasks
- Anti-stall: two signals — no todo progress for N iterations, or identical
tool calls repeated N times; warning injected into context
- Todo step validation: marking done requires a validation field describing
how result was verified; failed gets a soft nudge with tip for re-planning
- stream_complete: add think param to base class, ollama and openai backends
- Summarizer: raise max_tokens 1024→3000, expand system prompt with
user-preferences section and verbatim-value instructions
- Compression card: persist to session.messages (is_compression flag on
Message), show expandable summary in webclient with markdown body
- ToolResult.to_message_content: always include output on failure so
tracebacks and error details reach the model (fixes silent Error: None)
- Developer profile: fix subagent profile secretary→developer, add write_tool
to subagent_tools, clarify write_tool vs filesystem in system prompt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 20 Apr
|
| 2026-04-17 |

Planning phases, context compression, and tool improvements
...
Agent:
- Planning now a 3-phase async generator: Analysis → Execution plan → AIHelper critic
- Yield PlanningStatus events before each phase (UI progress labels)
- Phase 1 runs with think=True for deeper analysis
- Phase 2 includes available tool list so executor assignments are accurate
- Phase 3: independent critic pass validates and corrects TOOL: names against real tool list
- Planning converted from list return to async generator (fixes token accounting)
Backend:
- Context compression threshold: 80% → 70% to trigger earlier
- Compressor summary prompt: structured sections (goal, work state, key facts, outputs, errors)
- Terminal output capped at 5000 chars to prevent context flooding
- Web search: region=wt-wt for DDG, country=ALL for Brave, language=all for SearxNG
- Scratchpad: mandate writing a 'goal' section at start of multi-step tasks
- secretary max_iterations: 40→25, temperature: 0.7→0.5
- server_admin max_iterations: 40→20
Webclient:
- ThinkingCard strips <thought> XML tags leaked by Ollama
- planning_status WS event wired to chat.onPlanningStatus()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 17 Apr
|
| 2026-04-15 |
Add explicit output token budget for summarizer (context_summary_max_tokens)
...
Previously there was no num_predict set for the summarization LLM call,
so Ollama used its server default (often 128 tokens — very short summaries).
- Add max_tokens param to LLMBackend.complete() and OllamaBackend (→ num_predict)
- Add context_summary_max_tokens: int = 1024 to config
- Thread it through compress_context() and CompressionWorker
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 15 Apr
|
Expand summarization budget for better context quality
...
- _MAX_SUMMARY_INPUT_CHARS: 12k → 24k chars (2x input fed to summarizer)
- context_keep_recent: 10 → 8 turns (2 more turns go into each summary batch)
- Summarizer prompt: replace "Be brief" with "Be thorough" — capture code/config
snippets and enough detail to continue the conversation without original messages
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 15 Apr
|
| 2026-04-14 |
Expose compression summary as collapsible debug card in chat UI
...
ContextCompressed event now carries the full summary text produced by the
LLM. Compression notice in chat becomes a <details> element showing
message count (before→after) with the summary expandable on click.
Rendered as markdown via marked.js.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 14 Apr
|
| 2026-04-10 |

Add stop button and fix context compression hang
...
Stop generation:
- Client: send button toggles to red ■ during streaming; sends {type:stop} via WS
- Server: _stream_recv concurrently reads incoming messages during streaming using
asyncio.wait — stop signal is handled immediately without polling
- Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks
out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully,
model stays in VRAM. No task.cancel() which would eject the model.
- StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop
via the same shared stop_event inherited through task context
Context compression fix:
- compress_context passes think=False to llm.complete() — no extended reasoning
during summarization which caused GPU hang
- Input truncated to 12k chars before sending to summarizer
- LLMBackend.complete() / OllamaBackend.complete() accept think: bool | None override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 10 Apr
|
| 2026-04-08 |
Review fixes: events module, circular imports, deps, vision-aware compression
...
- Extract all AgentEvent dataclasses to navi/core/events.py; import from
there in agent.py and __init__.py — eliminates circular import between
workers and core
- workers/compressor.py: remove runtime import hack, use navi.core.events
- workers/base.py: WorkerResult.events typed as list[AgentEvent] (was Any)
- api/deps.py: replace @lru_cache on mutable list with module-level
singletons (_registries, _workers)
- core/compressor.py: _format_for_summary returns (text, images); images
passed to summarization LLM so vision models describe them in summary;
non-vision models silently ignore the images field; docstring updated
- client/js/app.js: add comment explaining is_summary backward compat branch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Separate display history from LLM context; formalize worker system
...
Architecture change:
- session.messages: full display history, never modified by compression
- session.context: what the LLM sees, may be compressed by workers
- System messages go only into context (not display history)
- Image injections (synthetic) go only into context
- User/assistant/tool messages go into both
SQLite: add context column with backward-compat migration
(empty context → initialized from messages on load)
Workers (navi/workers/):
- Worker ABC + WorkerContext + WorkerResult (base.py)
- CompressionWorker: compresses session.context when above threshold
- build_default_workers() returns [CompressionWorker()]
- Agent accepts workers list, runs them after StreamEnd
- Workers injected via deps.py get_workers() (lru_cached singleton)
- WebSocket agent construction also receives workers
Compressor: compress_context() now takes context[], not messages[]
Config: context_keep_recent 6 → 10
Agent: _run_workers() collects events from all workers and yields them
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|

Add context compression: rolling summarization when context fills up
...
Mechanism:
- After streaming ends, if context_tokens >= threshold (80% of num_ctx),
compress old turns into a summary message using the same LLM
- Partition: keep system msg + last N turns verbatim (default 6);
everything older goes to the summarizer
- Tool call groups (assistant + tool results) never split across boundary
- Existing summary messages folded into new compression pass — no stack growth
- Summary stored as Message(role=user, is_summary=True) after system msg
- On failure: logged, session left unchanged (non-fatal)
New files:
- navi/core/compressor.py: should_compress, partition_messages,
compress_session (pure logic, testable without agent)
New config (navi/config.py):
- context_compression_enabled: bool = True
- context_compression_threshold: float = 0.80
- context_keep_recent: int = 6
- context_summary_temperature: float = 0.3
New agent event: ContextCompressed(messages_before, messages_after)
Message.is_summary: bool field marks compressed history blocks
Client:
- context_compressed WS event → subtle inline notice in message list
- loadHistory: is_summary messages rendered as collapsible summary cards
- style.css: .summary-card, .compression-notice
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 8 Apr
|