# Agent Loop

The agent loop is the core execution engine. File: `navi/core/agent.py`.

## Three entry points

### `run(session_id, user_message)` → `str`
Non-streaming. Runs the full tool-calling loop and returns the final text. Used for REST endpoints or background tasks where streaming is not needed. No planning phase.

### `run_stream(session_id, user_message)` → `AsyncGenerator[AgentEvent]`
Streaming. Yields `AgentEvent` objects in real time. Used by the WebSocket handler. Includes planning phase.

### `run_ephemeral(user_message, profile_id)` → `str`
Non-persistent subagent. No DB reads/writes. Uses a temporary in-memory context. Called by `SpawnAgentTool`. Assigns a unique session ID (`subagent_<uuid12>`) to isolate its scratchpad from the parent and from other subagents.

---

## Planning phase (`_run_planning`)

Runs only when `profile.planning_enabled = True`, before the tool-calling loop.

**What it does:**
1. Sends the user request to the LLM with a special system prompt: "decide if this needs a plan".
2. LLM either responds `DIRECT` (skip planning) or produces a numbered step list.
3. If a real plan is returned, it's injected into `session.context` as an assistant message — the model then sees it as its own prior statement and naturally continues from it.
4. Yields `PlanReady(plan)` event → rendered as a collapsible card in the UI.

**Detection logic:**
- Response starts with `DIRECT` → skip (no plan needed).
- No numbered steps found (regex `^\s*\d+[\.\)]`) → skip (malformed response).
- Otherwise → inject plan, emit `PlanReady`.

**Parameters:** `think=False`, `temperature=0.3`, no tools → fast and structured.

---

## Tool-calling loop

Runs up to `profile.max_iterations` times.

```
iteration:
  1. Check stop_event → yield StreamStopped and return if set
  2. Call llm.stream_complete(context, tool_schemas)
     - Yields ThinkingDelta events during reasoning
     - Yields TextDelta events during text generation
     - Final chunk carries tool_calls or finish_reason="stop"
  3a. finish_reason == "stop" (no tool calls):
       → Save session, yield StreamEnd
       → Run post-turn workers (e.g. context compression)
       → Return
  3b. tool_calls present:
       For each tool call:
         - yield ToolStarted (pending card in UI)
         - Create asyncio.Task for tool execution
         - Set current_event_sink to a fresh Queue
         - Drain the queue (receives subagent events in real time)
         - yield ToolEvent (completed card in UI)
         - Append tool result to session.context
       Check if profile switched → reload profile + tools
       Continue to next iteration
```

### Sub-agent event forwarding

When a tool (e.g. `spawn_agent`) runs a subagent internally, subagent events arrive through `current_event_sink`. The parent agent drains that queue while the tool task runs, yielding subagent `ToolStarted`/`ToolEvent` events marked with `is_subagent=True`.

### Cooperative stop

Stop is signalled via `current_stop_event` (an `asyncio.Event`). The agent checks it:
- Before each LLM call
- During streaming (breaks out of the stream loop → calls `aclose()` on generator → Ollama closes gracefully, model stays in VRAM)
- After tool execution

**Never use `task.cancel()`** for stopping — it corrupts Starlette's WebSocket state.

---

## Workers (`_run_workers`)

Workers run sequentially after `StreamEnd`. Each receives a `WorkerContext` with session state, token counts, and LLM access.

Currently registered worker: `CompressionWorker` (`navi/workers/compressor.py`).

Worker result: `WorkerResult.events` — list of `AgentEvent` objects that are yielded after `StreamEnd`.

Pre-turn compression also exists: before calling the LLM, `run_stream()` checks if `session.context_token_count` is over the threshold and compresses proactively.

See [`sessions.md`](sessions.md) for compression details.

---

## System prompt construction

Each LLM call uses `_build_context()`, which injects:
1. System message: `persona + "---" + profile.system_prompt` (built fresh every call, never stored in session.context).
2. Optional memory message: `"## What I remember about the user\n\n{summary}"`.
3. Conversation messages from `session.context` (system messages stripped to avoid duplication).

This means profile switches and persona changes take effect immediately without modifying stored history.

---

## Context vars set by Agent

Before each `run_stream()` call: `current_session_id.set(session_id)`.  
Before each tool task: `current_event_sink.set(sink_queue)`.  
`run_ephemeral()` sets `current_session_id` to a unique subagent ID.

See [`architecture.md`](architecture.md) for the full ContextVar table.