High-level overview of the Navi backend for client developers. You don't need to modify the server, but understanding it helps build a better client.
gemma4:26b-a4b-it-q4_K_M (26B params, 4-bit quant)OLLAMA_THINK=true)Browser
│
├── WebSocket /ws/sessions/{id} ← streaming agent output
└── REST /sessions/* /agents/* ← session management
FastAPI (navi/main.py)
│
├── websocket.py (_AgentRun, subscriber queues, stop endpoint)
└── routes/ (sessions, agents, messages, health)
│
└── Agent (navi/core/agent.py)
│
├── Planning phase (one non-streaming LLM call before the tool loop)
├── Tool-calling loop (stream_complete, up to 40 iterations)
│ └── Tool execution (built-ins + user tools)
└── Workers (post-response: context compression, memory extraction)
│
├── LLM backend (Ollama)
├── ToolRegistry (built-ins + user tools from tools/)
├── ProfileRegistry (loaded from navi/profiles/*/config.json)
└── SessionStore (PostgreSQL or SQLite)
└── MemoryStore (long-term user facts, same DB)
{type: "message", content: "..."} over WebSocket._AgentRun, launches the agent task, subscribes a queue.thinking_delta, then tool calls or text.tool_started → tool_call.finish_reason == stop: emits stream_end, runs post-turn workers.When profile.planning_enabled = true (all current profiles), the agent makes an extra non-streaming LLM call before entering the tool loop. It produces a structured step-by-step plan, injects it as an assistant message in context, and emits plan_ready. Simple/direct questions are detected and skip this phase.
Sessions have two separate message lists:
messages — full display history, never compressed. This is what GET /sessions/{id} returns and what the client shows to the user.context — what the LLM actually sees. When context reaches ~80% of the window, a summarization worker compresses older messages, replacing them with a summary. This does NOT affect messages.The client should always render from messages (via REST), not try to track context.
The spawn_agent tool creates a nested agent run. It is synchronous and blocking — by the time tool_call for spawn_agent arrives, the sub-agent has fully completed. Sub-agent tool events are forwarded to the parent's WebSocket stream with is_subagent: true.
Profiles live in navi/profiles/<name>/:
config.json — model, temperature, enabled tools list, planning flagsystem_prompt.txt — the domain-specific system promptThe global personality (persona.txt) is prepended to every profile's system prompt. Profile switches take effect on the next LLM call within the same run, and fully on the next user message.
The agent has a persistent memory system (user facts stored in the database). The memory summary is injected as a system message at the start of each run. This is transparent to the client — no special handling needed.
Fires automatically post-response when context_token_count / ollama_num_ctx ≥ 0.80. Emits context_compressed event. The client only needs to display it as an informational notice if desired.