Core execution engine. File: navi/core/agent.py.
run_stream(session_id, user_message) → AsyncGenerator[AgentEvent]Streaming. Yields AgentEvent objects in real time. Used by the WebSocket handler. Runs the planning phase if profile.planning_enabled = True.
run(session_id, user_message) → strNon-streaming. Full tool-calling loop, returns final text. No planning phase.
run_ephemeral(user_message, profile_id) → tuple[str, bool]Non-persistent subagent. Temporary in-memory context. Called by SpawnAgentTool.
Returns (result_text, completed_normally). completed_normally is False if the subagent hit the iteration limit or timed out.
spawn_agent.profile_id is optional. If omitted, SpawnAgentTool resolves the parent session's current profile. If provided, the subagent uses the selected profile's model, subagent_system_prompt, planning flags, and tool set. Its tools come from that profile's subagent_tools, falling back to enabled_tools when subagent_tools is empty.
When spawned from a persistent parent session, session-aware tools run under the parent session id so file tools resolve the user's session directory rather than a subagent_* directory.
run_ephemeral reads the parent session from the DB when parent_session_id is provided, so session-aware tools (filesystem, todo, scratchpad) operate on the parent's data.
run_ephemeral saves the parent's current_session_id, current_model, current_user_id, current_user_role, and current_user_info before starting and restores them in a finally block. This prevents background tasks or the next parent iteration from inheriting stale subagent IDs.
_run_planning)Runs before the tool loop when profile.planning_enabled = True.
LLM receives the user request with a classification prompt. Outputs:
DIRECT → skip planning entirely (simple request).REFLECT: yes/no → continue to Phase 2 or 3.Runs only when planning_phase2_enabled = True AND Phase 1 outputs REFLECT: yes. One LLM call reviews the Phase 1 analysis and returns four sections:
The review is embedded into the Phase 3 prompt.
LLM produces milestones plus a numbered step list. Each step is assigned an executor:
TOOL: tool_name — single tool callAGENT: profile_id — bounded 3+ tool-call subtask delegated to a subagent via spawn_agentSELF — handled inline (synthesis, context-dependent action)Plan depth is adaptive:
Comma test (enforced in prompt): if a step description lists multiple things with "and" or commas, each item must be a separate step.
The plan is injected into session.context as an assistant message and saved to session.messages with is_plan=True for UI rendering. The todo list is auto-populated from the plan steps.
All flags live on AgentProfile and can be set per-profile in config.json.
| Flag | Default | What it does |
|---|---|---|
think_enabled |
true |
Passes think=True to LLM on every main-loop call (extended reasoning) |
iteration_budget_enabled |
true |
Injects remaining iteration count into context so model wraps up in time |
planning_phase2_enabled |
false |
Enables Phase 2 structured review (one extra LLM call when Phase 1 outputs REFLECT: yes) |
goal_anchoring_enabled |
true |
Injects goal-reminder system message every N iterations |
goal_anchoring_interval |
5 |
N for goal anchoring |
anti_stall_enabled |
true |
Detects looping without todo progress and injects a warning |
anti_stall_threshold |
8 |
Consecutive iterations without progress before warning fires |
step_validation_enabled |
false |
Blocks marking a todo step done without a validation field |
adaptive_replan_enabled |
false |
When a step is marked failed, queues a re-plan prompt for the next iteration |
subagent_planning_enabled |
false |
Subagents run their own planning phase |
Runs up to profile.max_iterations times.
Each iteration:
1. Check stop_event → yield StreamStopped if set
2. Build context: _build_context() injects iteration budget and goal anchor (if due)
3. Check anti-stall: if stalled, append warning message to context
4. Inject queued adaptive re-plan message (if a step failed last iteration)
5. llm.stream_complete(context, tool_schemas)
→ ThinkingDelta/ThinkingEnd events during reasoning
→ TextDelta events during text generation
6a. No tool calls → save session, yield StreamEnd, run workers, return
6b. Tool calls → execute each, yield ToolEvent, append results to context
7. Update anti-stall counters, detect newly-failed todo steps
8. Check if profile switched → reload profile + tools
When spawn_agent runs a subagent, its events arrive through current_event_sink. The parent drains the queue in real time, yielding subagent events marked with is_subagent=True.
Stop is signalled via current_stop_event (an asyncio.Event). Checked before each LLM call, during streaming, and after tool execution. Never use task.cancel() — it corrupts WebSocket state.
run_stream() wraps the LLM generator with _iter_stream_guarded(), which provides two safety layers:
await on the first token can block for minutes. The wrapper polls stop_event every second so the user's Stop button works even during silent prefill.first_chunk_timeout (default 120 s) caps prefill wait time. chunk_timeout (default 60 s) caps gaps between subsequent tokens. On timeout the generator is closed, terminating the HTTP connection to Ollama so GPU load drops to idle.| Env var | Default | Purpose |
|---|---|---|
LLM_STREAM_FIRST_CHUNK_TIMEOUT |
120 |
Max seconds to wait for the first token |
LLM_STREAM_CHUNK_TIMEOUT |
60 |
Max seconds between tokens after the first |
Run sequentially after StreamEnd. Currently: CompressionWorker.
Pre-turn compression also runs at the start of run_stream() if session.context_token_count exceeds the threshold. See sessions.md.
_build_context)Every LLM call receives:
persona + "---" + profile.system_prompt (injected fresh, never stored)."## What I remember about the user\n...".session.context messages (system messages stripped to avoid duplication).Profile switches and persona changes take effect immediately.
The built system prompt string is cached per profile ID in ContextBuilder to avoid rebuilding on every turn. The cache is invalidated when the profile is reloaded (e.g. after switch_profile or hot-reload). This saves ~1–2 ms per turn for profiles with large system prompts.