|
Fix LLM hang: stop button during prefill, context guard, timeouts
Root cause: during prefill (processing input tokens), Ollama emits no HTTP chunks. The `async for chunk in stream_complete()` loop body never executes, so stop_event is never checked — Stop button has no effect. Same issue with complete() calls (planning, compression): blocking await with no cancellation path. Fixes: _iter_stream_guarded() (agent.py, module-level): Wraps any stream_complete() generator. Polls stop_event every 1s while waiting for the next chunk using asyncio.wait() — so Stop works even during multi-minute prefill. On stop or timeout, calls aclose() on the generator which closes the HTTP connection to Ollama → generation halts → GPU drops to idle. Applied to both run_stream() and run_ephemeral(). _check_context_size() (Agent method): Estimates context tokens (chars/4 + 500 per image) before every LLM call. Raises ContextTooLargeError (new NaviError subclass) at 92% of ollama_num_ctx — before Ollama ever receives the request. _run_planning() timeouts: Both complete() calls (phase 1 and 2) wrapped with asyncio.wait_for(). Timeout logged and planning skipped gracefully — execution continues. New config (config.py): llm_complete_timeout = 120s llm_stream_first_chunk_timeout = 180s (prefill budget) llm_stream_chunk_timeout = 60s (inter-token budget) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
|---|
|
|
| navi/config.py |
|---|
| navi/core/agent.py |
|---|
| navi/exceptions.py |
|---|