The agent loop is the core execution engine. File: navi/core/agent.py.
run(session_id, user_message) → strNon-streaming. Runs the full tool-calling loop and returns the final text. Used for REST endpoints or background tasks where streaming is not needed. No planning phase.
run_stream(session_id, user_message) → AsyncGenerator[AgentEvent]Streaming. Yields AgentEvent objects in real time. Used by the WebSocket handler. Includes planning phase.
run_ephemeral(user_message, profile_id) → strNon-persistent subagent. No DB reads/writes. Uses a temporary in-memory context. Called by SpawnAgentTool. Assigns a unique session ID (subagent_<uuid12>) to isolate its scratchpad from the parent and from other subagents.
_run_planning)Runs only when profile.planning_enabled = True, before the tool-calling loop.
What it does:
DIRECT (skip planning) or produces a numbered step list.session.context as an assistant message — the model then sees it as its own prior statement and naturally continues from it.PlanReady(plan) event → rendered as a collapsible card in the UI.Detection logic:
DIRECT → skip (no plan needed).^\s*\d+[\.\)]) → skip (malformed response).PlanReady.Parameters: think=False, temperature=0.3, no tools → fast and structured.
Runs up to profile.max_iterations times.
iteration:
1. Check stop_event → yield StreamStopped and return if set
2. Call llm.stream_complete(context, tool_schemas)
- Yields ThinkingDelta events during reasoning
- Yields TextDelta events during text generation
- Final chunk carries tool_calls or finish_reason="stop"
3a. finish_reason == "stop" (no tool calls):
→ Save session, yield StreamEnd
→ Run post-turn workers (e.g. context compression)
→ Return
3b. tool_calls present:
For each tool call:
- yield ToolStarted (pending card in UI)
- Create asyncio.Task for tool execution
- Set current_event_sink to a fresh Queue
- Drain the queue (receives subagent events in real time)
- yield ToolEvent (completed card in UI)
- Append tool result to session.context
Check if profile switched → reload profile + tools
Continue to next iteration
When a tool (e.g. spawn_agent) runs a subagent internally, subagent events arrive through current_event_sink. The parent agent drains that queue while the tool task runs, yielding subagent ToolStarted/ToolEvent events marked with is_subagent=True.
Stop is signalled via current_stop_event (an asyncio.Event). The agent checks it:
aclose() on generator → Ollama closes gracefully, model stays in VRAM)Never use task.cancel() for stopping — it corrupts Starlette's WebSocket state.
_run_workers)Workers run sequentially after StreamEnd. Each receives a WorkerContext with session state, token counts, and LLM access.
Currently registered worker: CompressionWorker (navi/workers/compressor.py).
Worker result: WorkerResult.events — list of AgentEvent objects that are yielded after StreamEnd.
Pre-turn compression also exists: before calling the LLM, run_stream() checks if session.context_token_count is over the threshold and compresses proactively.
See sessions.md for compression details.
Each LLM call uses _build_context(), which injects:
persona + "---" + profile.system_prompt (built fresh every call, never stored in session.context)."## What I remember about the user\n\n{summary}".session.context (system messages stripped to avoid duplication).This means profile switches and persona changes take effect immediately without modifying stored history.
Before each run_stream() call: current_session_id.set(session_id).
Before each tool task: current_event_sink.set(sink_queue).run_ephemeral() sets current_session_id to a unique subagent ID.
See architecture.md for the full ContextVar table.