Add stop button and fix context compression hang

Fork: 0

root / navi-1

Browse code Add stop button and fix context compression hang Stop generation: - Client: send button toggles to red ■ during streaming; sends {type:stop} via WS - Server: _stream_recv concurrently reads incoming messages during streaming using asyncio.wait — stop signal is handled immediately without polling - Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully, model stays in VRAM. No task.cancel() which would eject the model. - StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop via the same shared stop_event inherited through task context Context compression fix: - compress_context passes think=False to llm.complete() — no extended reasoning during summarization which caused GPU hang - Input truncated to 12k chars before sending to summarizer - LLMBackend.complete() / OllamaBackend.complete() accept think: bool \| None override Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> feature/navi-code master vmkdemo
1 parent b3fb945 commit 86402e0e4ed879ff2fa96c6ad7ac7e34f6a4565b Eugene Sukhodolskiy authored on 10 Apr

Browse code

Stop generation:
- Client: send button toggles to red ■ during streaming; sends {type:stop} via WS
- Server: _stream_recv concurrently reads incoming messages during streaming using
  asyncio.wait — stop signal is handled immediately without polling
- Cooperative stop via asyncio.Event (current_stop_event ContextVar): agent breaks
  out of LLM async-for cleanly so aclose() fires → Ollama stream closes gracefully,
  model stays in VRAM. No task.cancel() which would eject the model.
- StreamStopped event propagates through run_stream/run_ephemeral; sub-agents stop
  via the same shared stop_event inherited through task context

Context compression fix:
- compress_context passes think=False to llm.complete() — no extended reasoning
  during summarization which caused GPU hang
- Input truncated to 12k chars before sending to summarizer
- LLMBackend.complete() / OllamaBackend.complete() accept think: bool | None override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feature/navi-code master vmkdemo

1 parent b3fb945 commit 86402e0e4ed879ff2fa96c6ad7ac7e34f6a4565b

Eugene Sukhodolskiy authored on 10 Apr

Patch

Unified Split

Showing 15 changed files

Ignore Space Show notes View client/js/app.js

Ignore Space Show notes View client/js/ws.js

Ignore Space Show notes View client/style.css

Ignore Space Show notes View navi/api/websocket.py

Ignore Space Show notes View navi/core/agent.py

Ignore Space Show notes View navi/core/compressor.py

Ignore Space Show notes View navi/core/events.py

Ignore Space Show notes View navi/llm/base.py

Ignore Space Show notes View navi/llm/ollama.py

Ignore Space Show notes View navi/llm/openai_backend.py

Ignore Space Show notes View navi/profiles/secretary.py

Ignore Space Show notes View navi/profiles/server_admin.py

Ignore Space Show notes View navi/profiles/smart_home.py

Ignore Space Show notes View navi/tools/base.py

Ignore Space Show notes View navi/tools/terminal.py

Show line notes below