Add stop button and fix context compression hang
Stop generation:
- Client: send button toggles to red ■ during streaming; sends {type:stop} via WS
- Server: _stream_recv concurrently reads incoming messages during streaming using
  asyncio.wait — stop signal is handled immediately without polling
- Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks
  out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully,
  model stays in VRAM. No task.cancel() which would eject the model.
- StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop
  via the same shared stop_event inherited through task context

Context compression fix:
- compress_context passes think=False to llm.complete() — no extended reasoning
  during summarization which caused GPU hang
- Input truncated to 12k chars before sending to summarizer
- LLMBackend.complete() / OllamaBackend.complete() accept think: bool | None override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent b3fb945 commit 86402e0e4ed879ff2fa96c6ad7ac7e34f6a4565b
@Eugene Sukhodolskiy Eugene Sukhodolskiy authored on 10 Apr
Showing 15 changed files
View
client/js/app.js
View
client/js/ws.js
View
client/style.css
View
navi/api/websocket.py
View
navi/core/agent.py
View
navi/core/compressor.py
View
navi/core/events.py
View
navi/llm/base.py
View
navi/llm/ollama.py
View
navi/llm/openai_backend.py
View
navi/profiles/secretary.py
View
navi/profiles/server_admin.py
View
navi/profiles/smart_home.py
View
navi/tools/base.py
View
navi/tools/terminal.py