Agent Loop

Core execution engine. File: navi/core/agent.py.

Entry points

`run_stream(session_id, user_message)` → `AsyncGenerator[AgentEvent]`

Streaming. Yields AgentEvent objects in real time. Used by the WebSocket handler. Runs the planning phase if profile.planning_enabled = True.

`run(session_id, user_message)` → `str`

Non-streaming. Full tool-calling loop, returns final text. No planning phase.

`run_ephemeral(user_message, profile_id)` → `tuple[str, bool]`

Non-persistent subagent. Temporary in-memory context. Called by SpawnAgentTool.

Returns (result_text, completed_normally). completed_normally is False if the subagent hit the iteration limit or timed out.

spawn_agent.profile_id is optional. If omitted, SpawnAgentTool resolves the parent session's current profile. If provided, the subagent uses the selected profile's model, subagent_system_prompt, planning flags, and tool set. Its tools come from that profile's subagent_tools, falling back to enabled_tools when subagent_tools is empty.

When spawned from a persistent parent session, session-aware tools run under the parent session id so file tools resolve the user's session directory rather than a subagent_* directory.

run_ephemeral reads the parent session from the DB when parent_session_id is provided, so session-aware tools (filesystem, todo, scratchpad) operate on the parent's data.

ContextVar restoration

run_ephemeral saves the parent's current_session_id, current_model, current_user_id, current_user_role, and current_user_info before starting and restores them in a finally block. This prevents background tasks or the next parent iteration from inheriting stale subagent IDs.

Planning phase (`_run_planning`)

Runs before the tool loop when profile.planning_enabled = True.

Phase 1 — Analysis

LLM receives the user request with a classification prompt. Outputs:

DIRECT → skip planning entirely (simple request).
A structured analysis + REFLECT: yes/no → continue to Phase 2 or 3.

Phase 2 — Structured review (conditional)

Runs only when planning_phase2_enabled = True AND Phase 1 outputs REFLECT: yes. One LLM call reviews the Phase 1 analysis and returns four sections:

Critic — wrong assumptions, risks, contradictions, facts to verify
Pragmatist — simpler path, unnecessary steps, better executor choices
Detailer — missing requirements, source files/docs/tools to inspect, validation gaps
Plan Adjustments — concrete changes Phase 3 must apply

The review is embedded into the Phase 3 prompt.

Phase 3 — Execution plan

LLM produces milestones plus a numbered step list. Each step is assigned an executor:

TOOL: tool_name — single tool call
AGENT: profile_id — bounded 3+ tool-call subtask delegated to a subagent via spawn_agent
SELF — handled inline (synthesis, context-dependent action)

Plan depth is adaptive:

simple: 1-3 steps
medium: 5-9 steps
complex or autonomous: 8-15 steps
hard maximum: 15 steps

Comma test (enforced in prompt): if a step description lists multiple things with "and" or commas, each item must be a separate step.

The plan is injected into session.context as an assistant message and saved to session.messages with is_plan=True for UI rendering. The todo list is auto-populated from the plan steps.

Thinking mechanics

All flags live on AgentProfile and can be set per-profile in config.json.

Flag	Default	What it does
`think_enabled`	`true`	Passes `think=True` to LLM on every main-loop call (extended reasoning)
`iteration_budget_enabled`	`true`	Injects remaining iteration count into context so model wraps up in time
`planning_phase2_enabled`	`false`	Enables Phase 2 structured review (one extra LLM call when Phase 1 outputs `REFLECT: yes`)
`goal_anchoring_enabled`	`true`	Injects goal-reminder system message every N iterations
`goal_anchoring_interval`	`5`	N for goal anchoring
`anti_stall_enabled`	`true`	Detects looping without todo progress and injects a warning
`anti_stall_threshold`	`8`	Consecutive iterations without progress before warning fires
`step_validation_enabled`	`false`	Blocks marking a todo step `done` without a `validation` field
`adaptive_replan_enabled`	`false`	When a step is marked `failed`, queues a re-plan prompt for the next iteration
`subagent_planning_enabled`	`false`	Subagents run their own planning phase

Tool-calling loop

Runs up to profile.max_iterations times.

Each iteration:
  1. Check stop_event → yield StreamStopped if set
  2. Build context: _build_context() injects iteration budget and goal anchor (if due)
  3. Check anti-stall: if stalled, append warning message to context
  4. Inject queued adaptive re-plan message (if a step failed last iteration)
  5. llm.stream_complete(context, tool_schemas)
     → ThinkingDelta/ThinkingEnd events during reasoning
     → TextDelta events during text generation
  6a. No tool calls → save session, yield StreamEnd, run workers, return
  6b. Tool calls → execute each, yield ToolEvent, append results to context
  7. Update anti-stall counters, detect newly-failed todo steps
  8. Check if profile switched → reload profile + tools

Sub-agent event forwarding

When spawn_agent runs a subagent, its events arrive through current_event_sink. The parent drains the queue in real time, yielding subagent events marked with is_subagent=True.

Cooperative stop

Stop is signalled via current_stop_event (an asyncio.Event). Checked before each LLM call, during streaming, and after tool execution. Never use task.cancel() — it corrupts WebSocket state.

Streaming guard wrapper

run_stream() wraps the LLM generator with _iter_stream_guarded(), which provides two safety layers:

Stop-event polling during prefill. Ollama emits no chunks during prefill, so a plain await on the first token can block for minutes. The wrapper polls stop_event every second so the user's Stop button works even during silent prefill.
Hard timeouts. first_chunk_timeout (default 120 s) caps prefill wait time. chunk_timeout (default 60 s) caps gaps between subsequent tokens. On timeout the generator is closed, terminating the HTTP connection to Ollama so GPU load drops to idle.

Env var	Default	Purpose
`LLM_STREAM_FIRST_CHUNK_TIMEOUT`	`120`	Max seconds to wait for the first token
`LLM_STREAM_CHUNK_TIMEOUT`	`60`	Max seconds between tokens after the first

Workers

Run sequentially after StreamEnd. Currently: CompressionWorker.

Pre-turn compression also runs at the start of run_stream() if session.context_token_count exceeds the threshold. See sessions.md.

System prompt construction (`_build_context`)

Every LLM call receives:

System message: persona + "---" + profile.system_prompt (injected fresh, never stored).
Optional memory message: "## What I remember about the user\n...".
session.context messages (system messages stripped to avoid duplication).

Profile switches and persona changes take effect immediately.

System prompt caching

The built system prompt string is cached per profile ID in ContextBuilder to avoid rebuilding on every turn. The cache is invalidated when the profile is reloaded (e.g. after switch_profile or hot-reload). This saves ~1–2 ms per turn for profiles with large system prompts.