Newer
Older
navi-1 / docs / agent.md
@Eugene Sukhodolskiy Eugene Sukhodolskiy on 14 Apr 4 KB Add backend documentation

Agent Loop

The agent loop is the core execution engine. File: navi/core/agent.py.

Three entry points

run(session_id, user_message)str

Non-streaming. Runs the full tool-calling loop and returns the final text. Used for REST endpoints or background tasks where streaming is not needed. No planning phase.

run_stream(session_id, user_message)AsyncGenerator[AgentEvent]

Streaming. Yields AgentEvent objects in real time. Used by the WebSocket handler. Includes planning phase.

run_ephemeral(user_message, profile_id)str

Non-persistent subagent. No DB reads/writes. Uses a temporary in-memory context. Called by SpawnAgentTool. Assigns a unique session ID (subagent_<uuid12>) to isolate its scratchpad from the parent and from other subagents.


Planning phase (_run_planning)

Runs only when profile.planning_enabled = True, before the tool-calling loop.

What it does:

  1. Sends the user request to the LLM with a special system prompt: "decide if this needs a plan".
  2. LLM either responds DIRECT (skip planning) or produces a numbered step list.
  3. If a real plan is returned, it's injected into session.context as an assistant message — the model then sees it as its own prior statement and naturally continues from it.
  4. Yields PlanReady(plan) event → rendered as a collapsible card in the UI.

Detection logic:

  • Response starts with DIRECT → skip (no plan needed).
  • No numbered steps found (regex ^\s*\d+[\.\)]) → skip (malformed response).
  • Otherwise → inject plan, emit PlanReady.

Parameters: think=False, temperature=0.3, no tools → fast and structured.


Tool-calling loop

Runs up to profile.max_iterations times.

iteration:
  1. Check stop_event → yield StreamStopped and return if set
  2. Call llm.stream_complete(context, tool_schemas)
     - Yields ThinkingDelta events during reasoning
     - Yields TextDelta events during text generation
     - Final chunk carries tool_calls or finish_reason="stop"
  3a. finish_reason == "stop" (no tool calls):
       → Save session, yield StreamEnd
       → Run post-turn workers (e.g. context compression)
       → Return
  3b. tool_calls present:
       For each tool call:
         - yield ToolStarted (pending card in UI)
         - Create asyncio.Task for tool execution
         - Set current_event_sink to a fresh Queue
         - Drain the queue (receives subagent events in real time)
         - yield ToolEvent (completed card in UI)
         - Append tool result to session.context
       Check if profile switched → reload profile + tools
       Continue to next iteration

Sub-agent event forwarding

When a tool (e.g. spawn_agent) runs a subagent internally, subagent events arrive through current_event_sink. The parent agent drains that queue while the tool task runs, yielding subagent ToolStarted/ToolEvent events marked with is_subagent=True.

Cooperative stop

Stop is signalled via current_stop_event (an asyncio.Event). The agent checks it:

  • Before each LLM call
  • During streaming (breaks out of the stream loop → calls aclose() on generator → Ollama closes gracefully, model stays in VRAM)
  • After tool execution

Never use task.cancel() for stopping — it corrupts Starlette's WebSocket state.


Workers (_run_workers)

Workers run sequentially after StreamEnd. Each receives a WorkerContext with session state, token counts, and LLM access.

Currently registered worker: CompressionWorker (navi/workers/compressor.py).

Worker result: WorkerResult.events — list of AgentEvent objects that are yielded after StreamEnd.

Pre-turn compression also exists: before calling the LLM, run_stream() checks if session.context_token_count is over the threshold and compresses proactively.

See sessions.md for compression details.


System prompt construction

Each LLM call uses _build_context(), which injects:

  1. System message: persona + "---" + profile.system_prompt (built fresh every call, never stored in session.context).
  2. Optional memory message: "## What I remember about the user\n\n{summary}".
  3. Conversation messages from session.context (system messages stripped to avoid duplication).

This means profile switches and persona changes take effect immediately without modifying stored history.


Context vars set by Agent

Before each run_stream() call: current_session_id.set(session_id).
Before each tool task: current_event_sink.set(sink_queue).
run_ephemeral() sets current_session_id to a unique subagent ID.

See architecture.md for the full ContextVar table.