Navi — Architecture & Reference

+ + +

🧭 Project Overview

Navi is a personal modular AI agent system. FastAPI backend + vanilla JS client. The agent is named Navi — female personal assistant. Runs locally via Ollama.

+ +

Entry point

navi/main.py

FastAPI app

Run command

uvicorn navi.main:app

--reload --port 8000

Default model

gemma4:e2b-it-q8_0

Ollama, 2B active params

Context window

65 536 tokens

OLLAMA_NUM_CTX

Database

SQLite

navi.db via aiosqlite

Thinking

Enabled

OLLAMA_THINK=true

+ + +

📦 Stack

+ + + + + + + + + + +

Layer	Technology	Notes
Web framework	`FastAPI` + `uvicorn`	ASGI, async throughout
LLM backend (primary)	`Ollama`	Local, `OllamaBackend` in `navi/llm/ollama.py`
LLM backend (alt)	OpenAI-compatible	`navi/llm/openai_backend.py`
Database	`aiosqlite`	Sessions + memory facts in `navi.db`
Config	`pydantic-settings`	Reads `.env`, typed `Settings` object
Logging	`structlog`	Structured JSON-friendly logs
Client	Vanilla JS ES modules	marked.js + highlight.js via esm.sh CDN
Markdown rendering	`marked.js`	In browser, assistant messages

+ + +

🗂️ Component Map

+ +

Client (browser)

+ WebSocket /ws/sessions/{id} + REST /sessions/* + REST /agents/* +

↓

FastAPI — navi/main.py

+ api/websocket.py · _AgentRun · stop endpoint + routes/sessions.py + routes/agents.py + routes/messages.py +

↓

Agent — navi/core/agent.py

+ run_stream() → AsyncGenerator[AgentEvent] + run() → str + run_ephemeral() → str (subagent) + _run_planning() + _run_workers() +

↓

Registries — navi/core/registry.py · build_default_registries()

+ ToolRegistry + ProfileRegistry + BackendRegistry +

↓

LLM Backend

+ OllamaBackend + complete() + stream_complete() +

SessionStore (SQLite)

+ messages[] + context[] +

MemoryStore (SQLite)

+ memory_facts + summary +

+ + +

🔄 Request Lifecycle

Streaming flow from WebSocket message to final response.

+ Client sends message + {type:"message", content:"...", images:[...]} over WebSocket +

+ websocket_session() creates _AgentRun + Subscribes a queue, launches _run_agent() as asyncio task, sends stream_start +

+ Pre-turn compression check + If context_token_count ≥ num_ctx × threshold → compress context before LLM call +

+ Planning phase + If profile.planning_enabled: fast non-streaming LLM call → yields plan_ready event if plan generated +

+ Tool-calling loop (max_iterations) + Calls llm.stream_complete() → yields thinking/text/tool events. Loops until finish_reason=stop +

+ StreamEnd + workers + Saves session to DB. Runs post-turn workers (compression). Yields context_compressed if triggered +

✓

+ Done + Events broadcast from _AgentRun to all subscriber queues → sent as JSON to WebSocket +

+ + +

🔗 Context Vars

Thread-safe async-safe state shared between Agent and tools. Defined in navi/tools/base.py.

+ + + + + + + + + + + + + + + + + + + + +

ContextVar	Type	Set by	Used by
current_session_id	`str \| None`	Agent before each run	SSH pool, scratchpad, todo — per-session state
current_event_sink	`Queue \| None`	run_stream() per tool task	run_ephemeral() forwards sub-agent events to parent stream
current_stop_event	`Event \| None`	_run_agent() before run_stream()	Agent loop checks before each LLM call and mid-stream

+ Never use task.cancel() for stopping generation. It corrupts Starlette's WebSocket receive state. Use current_stop_event.set() via POST /sessions/{id}/stop. +

+ + +

⚙️ Agent Loop

Three entry points in navi/core/agent.py:

+ + + + + + + + + + + + + + + + + + + + +

Method	Returns	Persistence	Planning
`run(session_id, msg)`	`str`	SQLite session	No
`run_stream(session_id, msg)`	`AsyncGenerator[AgentEvent]`	SQLite session	Yes (if profile.planning_enabled)
`run_ephemeral(msg, profile_id)`	`str`	In-memory only	No

+ +

System prompt construction

Built fresh on every LLM call — never stored in session.context.

NAVI_PERSONA (global personality)
+───────────────────────────────────────
+profile.system_prompt (domain rules)
+───────────────────────────────────────
+[memory injection: "## What I remember about the user"]
+───────────────────────────────────────
+session.context messages (history, no system msgs)

+ +

Sub-agent isolation

run_ephemeral() sets current_session_id = "subagent_<uuid12>" so each subagent has its own isolated scratchpad and SSH connection pool entry.

+ + +

🗺️ Planning Phase

Runs before the tool-calling loop when profile.planning_enabled = true.

+ +

+ LLM call: decide or plan + Fast non-streaming call: think=False, temperature=0.3, no tools +

+ Response classification + Starts with DIRECT → skip planning. No numbered steps found → skip. Otherwise → real plan. +

+ Plan injection + Appended to session.context as assistant message — model continues from it naturally +

+ PlanReady event emitted + Rendered as collapsible 🗺️ card in UI before execution begins +

+ + +

💾 Sessions

+ +

Session model (`navi/core/session.py`)

+ + + + + + + + +

Field	Type	Description
`id`	UUID str	Unique session identifier
`profile_id`	str	Active profile
`messages`	list[Message]	Full history Never compressed. Used for UI display.
`context`	list[Message]	LLM context May be replaced by compression summary.
`context_token_count`	int	Accumulated tokens; reset to 0 after compression
`pinned`	bool	Pinned sessions appear first in sidebar

+ +

Dual-buffer design

+ Key invariant: session.messages is the full, unmodified conversation history — always available for display. session.context is what the LLM actually sees — may contain a compression summary instead of old messages. +

+ +

Message format

+ + + + + + + + + + +

Field	Present on	Type
`role`	all	`user \| assistant \| tool \| system`
`content`	most	`str \| None`
`images`	user, assistant	`list[str]` — base64
`tool_calls`	assistant (when calling tools)	`list[ToolCallRequest]`
`tool_call_id`	tool results	`str`
`name`	tool results	tool name
`is_summary`	compressed blocks	`bool`
`created_at`	user/assistant	ISO 8601 datetime

+ + +

🗜️ Context Compression

Keeps the LLM context within the token budget. Only session.context is modified — session.messages is never touched.

+ +

Trigger points

Pre-turn

Before LLM call in run_stream()

Checks context_token_count against threshold

Post-turn (worker)

After StreamEnd via CompressionWorker

Re-checks and compresses if still needed

+ +

Algorithm

+ Partition into turns + Keep last context_keep_recent turns verbatim. Tool call groups never split. +

+ Format old turns as text + Tool args truncated to 120 chars, results to 300 chars. Total input capped at 12 000 chars. +

+ Summarize with LLM + think=False, bullet-point output. Same model — no model swap or extra loading. +

+ Replace with summary message + role=user, is_summary=True. Result: system_msgs + [summary] + recent_turns +

+ +

Config

+ + + + + + +

Setting	Default	Description
`CONTEXT_COMPRESSION_ENABLED`	`true`	Enable/disable
`CONTEXT_COMPRESSION_THRESHOLD`	`0.80`	Trigger at 80% of context window
`CONTEXT_KEEP_RECENT`	`10`	Turns kept verbatim
`CONTEXT_SUMMARY_TEMPERATURE`	`0.3`	Summarization temperature

+ + +

🔧 Built-in Tools

Registered in build_default_registries() as builtins. Never removed on hot-reload.

+ + + + + + + + + + + + + + + + + + + + +

Name	Class	Description
`web_search`	WebSearchTool	DuckDuckGo web search
`web_view`	WebViewTool	Fetch and render a URL as text
`filesystem`	FilesystemTool	Read/write/list local files (path allowlist via config)
`http_request`	HttpRequestTool	Generic HTTP client — GET/POST/PUT/etc.
`code_exec`	CodeExecTool	Execute Python in a subprocess sandbox
`terminal`	TerminalTool	Run shell commands (command allowlist via config)
`ssh_exec`	SshExecTool	SSH into remote hosts; connection pool keyed by session ID
`image_view`	ImageViewTool	Load image from path/URL → base64 for multimodal LLM
`todo`	TodoTool	Per-session task checklist (set/update/read)
`scratchpad`	ScratchpadTool	Per-session named working notes (write/append/read/clear)
`reload_tools`	ReloadToolsTool	Hot-reload user tools without server restart
`write_tool`	WriteToolTool	Write a new user tool file and reload immediately
`list_tools`	ListToolsTool	Return the live tool list from registry
`tool_manual`	ToolManualTool	Return manuals/{name}.md or auto-generate from schema
`memory_search`	MemorySearchTool	Search long-term memory facts by keyword
`memory_forget`	MemoryForgetTool	Delete a fact from long-term memory
`spawn_agent`	SpawnAgentTool	Spawn an isolated subagent (blocking, synchronous)
`switch_profile`	SwitchProfileTool	Switch the active profile for the session

+ + +

🔌 User Tools

+ +

Discovery

Loaded from tools/*.py at startup
Files starting with _ are ignored
tools/enabled.json — names to include in all profiles
Errors are isolated per file (one bad file ≠ failure)
Hot-reload via reload_tools or after write_tool

Current user tools

get_current_datetime

Returns current date/time

user_notes

Persistent personal notes store

+ +

Image tool → multimodal injection

When image_view succeeds, it returns metadata={is_image: true, base64: "..."}. The agent appends a synthetic user message with the image to session.context (not messages) — making it visible to the next LLM call without polluting display history.

+ + +

📝 Tool Format

+ +

Module-level format (preferred for user tools)

name = "my_tool"
+description = "What it does and when to use it — be specific."
+parameters = {
+    "type": "object",
+    "properties": {
+        "param": {"type": "string", "description": "..."}
+    },
+    "required": ["param"]
+}
+
+async def execute(params: dict) -> str:
+    # Return a plain string on success.
+    # Raise an exception to signal failure.
+    return "result"

No classes, no module-level print(). The loader wraps execute in a Tool subclass automatically.

+ +

ToolResult (class-based format)

+ + + + + + +

Field	Type	Description
`success`	bool	Whether the tool succeeded
`output`	str	Always a string — LLM sees this
`error`	str \| None	Included in LLM output on failure
`metadata`	dict	Internal hints, e.g. `is_image: True`

+ +

Self-extension via write_tool

The agent can install new tools permanently at runtime. WriteToolTool validates, writes to tools/{name}.py, adds to tools/enabled.json, then hot-reloads. New tool is available from the next user message.

+ + +

📡 WebSocket Protocol

+ +

Endpoint: ws://host/ws/sessions/{session_id}
+ Closes with code 4004 if session not found.

+ +

Client → Server

{
+  "type": "message",         // required, always "message"
+  "content": "user text",    // required, non-empty
+  "images": ["base64..."],   // optional; data: URI prefix stripped server-side
+  "files": [                 // optional; from POST /sessions/{id}/files
+    {"name": "file.pdf", "path": "/abs/path/..."}
+  ]
+}

+ + +

📬 Events Reference

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Type	Direction	Fields	Description
stream_start	S→C	—	Agent processing began. Block user input.
thinking_delta	S→C	`delta`	Reasoning chunk (streaming). Accumulate until `thinking_end`.
thinking_end	S→C	—	Reasoning phase complete. Auto-collapsed in UI.
turn_thinking	S→C	`thinking`, `is_subagent`	Full reasoning block from tool-calling turn (non-streaming).
plan_ready	S→C	`plan`	Step-by-step plan before execution. Rendered as 🗺️ card.
tool_started	S→C	`tool`, `args`, `is_subagent`	Tool call began. Shows pending spinner in UI immediately.
tool_call	S→C	`tool`, `args`, `result`, `success`, `is_subagent`	Tool finished. Pairs with preceding `tool_started`.
stream_delta	S→C	`delta`	Final response text chunk. Accumulate to build full content.
stream_end	S→C	`content`, `context_tokens`, `max_context_tokens`	Final response complete. Unlock user input.
stream_stopped	S→C	—	User stopped generation via POST /sessions/{id}/stop.
context_compressed	S→C	`messages_before`, `messages_after`	Context compression ran after this turn.
profile_switched	S→C	`profile_id`, `profile_name`	Active profile changed mid-stream by switch_profile tool.
error	S→C	`message`	Unhandled error. Some are recoverable, some terminate the stream.

+ + +

🎬 Typical Event Sequences

+ +

Simple question (no tools)

stream_start

thinking_delta × N // if model reasons

thinking_end

stream_delta × N

stream_end

+ +

With planning + tools

stream_start

plan_ready // if planning_enabled

turn_thinking // reasoning before tool selection

tool_started

tool_call

tool_started

tool_call

thinking_delta × N

thinking_end

stream_delta × N

stream_end

context_compressed // optional, if threshold hit

+ +

Subagent (spawn_agent)

stream_start

tool_started spawn_agent is_subagent=false

turn_thinking is_subagent=true

tool_started web_search is_subagent=true

tool_call web_search is_subagent=true

tool_started filesystem is_subagent=true

tool_call filesystem is_subagent=true

tool_call spawn_agent is_subagent=false

stream_delta × N

stream_end

+ +

Profile switch

stream_start

tool_started switch_profile

profile_switched // update UI here

tool_call switch_profile

stream_delta × N

stream_end

+ + +

🌐 REST API

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Method	Path	Description
GET	`/health`	Health check → `{"status":"ok"}`
GET	`/agents/profiles`	List all available profiles
GET	`/agents/tools`	List all registered tools (builtin + user)
POST	`/sessions`	Create session → `{session_id, profile_id, created_at}`
GET	`/sessions`	List all sessions (sorted by pinned+last_active)
GET	`/sessions/{id}`	Full session with message history (display buffer)
GET	`/sessions/{id}/context`	LLM context (may differ from messages — for debugging)
PATCH	`/sessions/{id}/pin`	Pin or unpin a session
DEL	`/sessions/{id}`	Delete session and its uploaded files
POST	`/sessions/{id}/files`	Upload file (multipart/form-data). Max 200 MB. TTL 24h.
POST	`/sessions/{id}/messages`	Send message, wait for full response (non-streaming)
POST	`/sessions/{id}/stop`	Signal cooperative stop for running agent
WS	`/ws/sessions/{id}`	Streaming agent interface

+ + +

👤 Profiles

Profiles define tools, system prompt, model, and behaviour per domain. Defined in navi/profiles/.

+ +

+ + + + + + + + + + + + + + + + + + + + +

Profile ID	Name	Model	Temp	Planning
`secretary`	Personal Secretary	`gemma4:26b-a4b-it-q4_K_M`	0.7	Yes
`server_admin`	Server Administrator	`gemma4:26b-a4b-it-q4_K_M`	0.2	Yes
`smart_home`	Smart Home Assistant	`gemma4:26b-a4b-it-q4_K_M`	0.3	Yes

+ +

Per-profile scratchpad sections

+ + + + + +

Profile	Sections	Domain focus
`secretary`	`findings`, `sources`, `drafts`	Research, writing, analysis
`server_admin`	`status`, `logs`, `errors`, `plan`	Remote ops, monitoring
`smart_home`	`state`, `config`, `errors`	Home Assistant, IoT, automations

+ +

AgentProfile fields

+ + + + + + + + + + + +

Field	Type	Description
`id`	str	Unique identifier used in API and sessions
`name`	str	Human-readable name for UI
`system_prompt`	str	Domain-specific instructions (appended after persona)
`enabled_tools`	list[str]	Tool names available to this profile
`model`	str	Ollama model override (falls back to settings default)
`temperature`	float	LLM temperature
`max_iterations`	int	Tool-calling loop limit (default 50)
`planning_enabled`	bool	Run planning phase before tool loop
`llm_backend`	str	Backend key in BackendRegistry (default "ollama")

+ + +

🧠 Memory System

Long-term user memory: facts extracted from conversations, stored in SQLite, injected into every session.

+ +

Database schema

+ + + + + + + + + + + + + + + + + +

Table	Key columns	Purpose
`memory_facts`	`(category, key)` unique	Individual facts about the user — preferences, projects, environment
`memory_summary`	Single row (id=1)	Narrative summary generated from all facts; injected into every session
`session_memory_state`	`session_id, extracted_at`	Tracks which sessions have been processed for extraction

+ +

Automatic extraction trigger

POST /sessions (create new session) fires _process_stale_sessions() as a background task. Processes sessions idle > 30 minutes that haven't been extracted yet.

+ +

Memory injection

On every run_stream() / run() call, _memory_msg() fetches the summary and returns a system message: "## What I remember about the user\n\n{summary}". Injected after main system prompt, before conversation history.

+ +

Memory tools usage rules

+ Call memory_search when the user mentions something personal or before making assumptions about their environment. Do not call at session start reflexively — only when context warrants it. Call memory_forget only when explicitly asked. +

+ + +

⚙️ Configuration

All settings read from .env via pydantic-settings. Imported as from navi.config import settings.

+ +

LLM

+ + + + + + +

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`OLLAMA_DEFAULT_MODEL`	`gemma4:e2b-it-q8_0`	Default model (overridable per profile)
`OLLAMA_NUM_CTX`	`65536`	Context window size in tokens
`OLLAMA_THINK`	`true`	Enable extended reasoning

+ +

Security / Sandboxing

+ + + + + +

Variable	Default	Description
`FS_ALLOWED_PATHS`	`*`	Comma-separated paths filesystem tool can access. `*` = no limit
`TERMINAL_ALLOWED_COMMANDS`	`*`	Comma-separated allowed executables. `*` = allow all
`SSH_HOSTS_FILE`	`ssh_hosts.json`	Named SSH connections config

+ +

Persona

+ + + + +

Variable	Description
`NAVI_PERSONA`	Inline global personality prompt
`NAVI_PERSONA_FILE`	Path to .txt file with persona (recommended — inline doesn't parse multiline well)

+ +

Other

+ + + + + + + + +

Variable	Default	Description
`DB_PATH`	`navi.db`	SQLite file path
`LOG_LEVEL`	`INFO`	DEBUG / INFO / WARNING / ERROR
`TOOLS_DIR`	`tools`	User tools directory
`SESSION_FILES_DIR`	`session_files`	Uploaded files directory
`SESSION_FILES_MAX_SIZE_MB`	`200`	Max upload size per file
`SESSION_FILES_TTL_HOURS`	`24`	File retention hours

+ +

🧭 Project Overview

📦 Stack

🗂️ Component Map

🔄 Request Lifecycle

🔗 Context Vars

⚙️ Agent Loop

System prompt construction

Sub-agent isolation

🗺️ Planning Phase

💾 Sessions

Session model (navi/core/session.py)

Dual-buffer design

Message format

🗜️ Context Compression

Trigger points

Algorithm

Config

🔧 Built-in Tools

🔌 User Tools

Discovery

Current user tools

Image tool → multimodal injection

📝 Tool Format

Module-level format (preferred for user tools)

ToolResult (class-based format)

Self-extension via write_tool

📡 WebSocket Protocol

Client → Server

📬 Events Reference

🎬 Typical Event Sequences

Simple question (no tools)

With planning + tools

Subagent (spawn_agent)

Profile switch

🌐 REST API

👤 Profiles

Per-profile scratchpad sections

AgentProfile fields

🧠 Memory System

Database schema

Automatic extraction trigger

Memory injection

Memory tools usage rules

⚙️ Configuration

LLM

Security / Sandboxing

Persona

Other

Session model (`navi/core/session.py`)