Fix LLM hang: stop button during prefill, context guard, timeouts
Root cause: during prefill (processing input tokens), Ollama emits no
HTTP chunks. The `async for chunk in stream_complete()` loop body never
executes, so stop_event is never checked — Stop button has no effect.
Same issue with complete() calls (planning, compression): blocking await
with no cancellation path.

Fixes:

_iter_stream_guarded() (agent.py, module-level):
  Wraps any stream_complete() generator. Polls stop_event every 1s while
  waiting for the next chunk using asyncio.wait() — so Stop works even
  during multi-minute prefill. On stop or timeout, calls aclose() on the
  generator which closes the HTTP connection to Ollama → generation halts
  → GPU drops to idle. Applied to both run_stream() and run_ephemeral().

_check_context_size() (Agent method):
  Estimates context tokens (chars/4 + 500 per image) before every LLM
  call. Raises ContextTooLargeError (new NaviError subclass) at 92% of
  ollama_num_ctx — before Ollama ever receives the request.

_run_planning() timeouts:
  Both complete() calls (phase 1 and 2) wrapped with asyncio.wait_for().
  Timeout logged and planning skipped gracefully — execution continues.

New config (config.py):
  llm_complete_timeout = 120s
  llm_stream_first_chunk_timeout = 180s  (prefill budget)
  llm_stream_chunk_timeout = 60s         (inter-token budget)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 8b09439 commit 8c88f4987c923620aae607ba382c9b0c12183b15
@Eugene Sukhodolskiy Eugene Sukhodolskiy authored on 14 Apr
Showing 3 changed files
View
navi/config.py
View
navi/core/agent.py
View
navi/exceptions.py