🧭 Project Overview

Navi is a personal modular AI agent system. FastAPI backend + vanilla JS client. The agent is named Navi — female personal assistant. Runs locally via Ollama.

Entry point
navi/main.py
FastAPI app
Run command
uvicorn navi.main:app
--reload --port 8000
Default model
gemma4:31b-cloud
Ollama, 2B active params
Context window
65 536 tokens
OLLAMA_NUM_CTX
Database
SQLite
navi.db via aiosqlite
Thinking
Enabled
OLLAMA_THINK=true

📦 Stack

LayerTechnologyNotes
Web frameworkFastAPI + uvicornASGI, async throughout
LLM backend (primary)OllamaLocal, OllamaBackend in navi/llm/ollama.py
LLM backend (alt)OpenAI-compatiblenavi/llm/openai_backend.py
DatabaseaiosqliteSessions + memory facts in navi.db
Configpydantic-settingsReads .env, typed Settings object
LoggingstructlogStructured JSON-friendly logs
ClientVanilla JS ES modulesmarked.js + highlight.js via esm.sh CDN
Markdown renderingmarked.jsIn browser, assistant messages

🗂️ Component Map

Client (browser)
WebSocket /ws/sessions/{id} REST /sessions/* REST /agents/*
FastAPI — navi/main.py
api/websocket.py · _AgentRun · stop endpoint routes/sessions.py routes/agents.py routes/messages.py
Agent — navi/core/agent.py
run_stream() → AsyncGenerator[AgentEvent] run() → str run_ephemeral() → str (subagent) _run_planning() _run_workers()
Registries — navi/core/registry.py · build_default_registries()
ToolRegistry ProfileRegistry BackendRegistry
LLM Backend
OllamaBackend complete() stream_complete()
SessionStore (SQLite)
messages[] context[]
MemoryStore (SQLite)
memory_facts summary

🔄 Request Lifecycle

Streaming flow from WebSocket message to final response.

1
Client sends message {type:"message", content:"...", images:[...]} over WebSocket
2
websocket_session() creates _AgentRun Subscribes a queue, launches _run_agent() as asyncio task, sends stream_start
3
Pre-turn compression check If context_token_count ≥ num_ctx × threshold → compress context before LLM call
4
Planning phase If profile.planning_enabled: fast non-streaming LLM call → yields plan_ready event if plan generated
5
Tool-calling loop (max_iterations) Calls llm.stream_complete() → yields thinking/text/tool events. Loops until finish_reason=stop
6
StreamEnd + workers Saves session to DB. Runs post-turn workers (compression). Yields context_compressed if triggered
Done Events broadcast from _AgentRun to all subscriber queues → sent as JSON to WebSocket

🔗 Context Vars

Thread-safe async-safe state shared between Agent and tools. Defined in navi/tools/base.py.

ContextVarTypeSet byUsed by
current_session_id str | None Agent before each run SSH pool, scratchpad, todo — per-session state
current_event_sink Queue | None run_stream() per tool task run_ephemeral() forwards sub-agent events to parent stream
current_stop_event Event | None _run_agent() before run_stream() Agent loop checks before each LLM call and mid-stream
Never use task.cancel() for stopping generation. It corrupts Starlette's WebSocket receive state. Use current_stop_event.set() via POST /sessions/{id}/stop.

⚙️ Agent Loop

Three entry points in navi/core/agent.py:

MethodReturnsPersistencePlanning
run(session_id, msg) str SQLite session No
run_stream(session_id, msg) AsyncGenerator[AgentEvent] SQLite session Yes (if profile.planning_enabled)
run_ephemeral(msg, profile_id) str In-memory only No

System prompt construction

Built fresh on every LLM call — never stored in session.context.

NAVI_PERSONA (global personality)
───────────────────────────────────────
profile.system_prompt (domain rules)
───────────────────────────────────────
[memory injection: "## What I remember about the user"]
───────────────────────────────────────
session.context messages (history, no system msgs)

Sub-agent isolation

run_ephemeral() sets current_session_id = "subagent_<uuid12>" so each subagent has its own isolated scratchpad and SSH connection pool entry.

🗺️ Planning Phase

Runs before the tool-calling loop when profile.planning_enabled = true.

1
LLM call: decide or plan Fast non-streaming call: think=False, temperature=0.3, no tools
2
Response classification Starts with DIRECT → skip planning. No numbered steps found → skip. Otherwise → real plan.
3
Plan injection Appended to session.context as assistant message — model continues from it naturally
4
PlanReady event emitted Rendered as collapsible 🗺️ card in UI before execution begins

💾 Sessions

Session model (navi/core/session.py)

FieldTypeDescription
idUUID strUnique session identifier
profile_idstrActive profile
messageslist[Message]Full history Never compressed. Used for UI display.
contextlist[Message]LLM context May be replaced by compression summary.
context_token_countintAccumulated tokens; reset to 0 after compression
pinnedboolPinned sessions appear first in sidebar

Dual-buffer design

Key invariant: session.messages is the full, unmodified conversation history — always available for display. session.context is what the LLM actually sees — may contain a compression summary instead of old messages.

Message format

FieldPresent onType
rolealluser | assistant | tool | system
contentmoststr | None
imagesuser, assistantlist[str] — base64
tool_callsassistant (when calling tools)list[ToolCallRequest]
tool_call_idtool resultsstr
nametool resultstool name
is_summarycompressed blocksbool
created_atuser/assistantISO 8601 datetime

🗜️ Context Compression

Keeps the LLM context within the token budget. Only session.context is modified — session.messages is never touched.

Trigger points

Pre-turn
Before LLM call in run_stream()
Checks context_token_count against threshold
Post-turn (worker)
After StreamEnd via CompressionWorker
Re-checks and compresses if still needed

Algorithm

1
Partition into turns Keep last context_keep_recent turns verbatim. Tool call groups never split.
2
Format old turns as text Tool args truncated to 120 chars, results to 300 chars. Total input capped at 12 000 chars.
3
Summarize with LLM think=False, bullet-point output. Same model — no model swap or extra loading.
4
Replace with summary message role=user, is_summary=True. Result: system_msgs + [summary] + recent_turns

Config

SettingDefaultDescription
CONTEXT_COMPRESSION_ENABLEDtrueEnable/disable
CONTEXT_COMPRESSION_THRESHOLD0.80Trigger at 80% of context window
CONTEXT_KEEP_RECENT10Turns kept verbatim
CONTEXT_SUMMARY_TEMPERATURE0.3Summarization temperature

🔧 Built-in Tools

Registered in build_default_registries() as builtins. Never removed on hot-reload.

NameClassDescription
web_searchWebSearchToolDuckDuckGo web search
web_viewWebViewToolFetch and render a URL as text
filesystemFilesystemToolRead/write/list local files (path allowlist via config)
http_requestHttpRequestToolGeneric HTTP client — GET/POST/PUT/etc.
code_execCodeExecToolExecute Python in a subprocess sandbox
terminalTerminalToolRun shell commands (command allowlist via config)
ssh_execSshExecToolSSH into remote hosts; connection pool keyed by session ID
image_viewImageViewToolLoad image from path/URL → base64 for multimodal LLM
todoTodoToolPer-session task checklist (set/update/read)
scratchpadScratchpadToolPer-session named working notes (write/append/read/clear)
reload_toolsReloadToolsToolHot-reload user tools without server restart
write_toolWriteToolToolWrite a new user tool file and reload immediately
list_toolsListToolsToolReturn the live tool list from registry
tool_manualToolManualToolReturn manuals/{name}.md or auto-generate from schema
memory_searchMemorySearchToolSearch long-term memory facts by keyword
memory_forgetMemoryForgetToolDelete a fact from long-term memory
spawn_agentSpawnAgentToolSpawn an isolated subagent (blocking, synchronous)
switch_profileSwitchProfileToolSwitch the active profile for the session

🔌 User Tools

Discovery

  • Loaded from tools/*.py at startup
  • Files starting with _ are ignored
  • tools/enabled.json — names to include in all profiles
  • Errors are isolated per file (one bad file ≠ failure)
  • Hot-reload via reload_tools or after write_tool

Current user tools

get_current_datetime
Returns current date/time
user_notes
Persistent personal notes store

Image tool → multimodal injection

When image_view succeeds, it returns metadata={is_image: true, base64: "..."}. The agent appends a synthetic user message with the image to session.context (not messages) — making it visible to the next LLM call without polluting display history.

📝 Tool Format

Module-level format (preferred for user tools)

name = "my_tool"
description = "What it does and when to use it — be specific."
parameters = {
    "type": "object",
    "properties": {
        "param": {"type": "string", "description": "..."}
    },
    "required": ["param"]
}

async def execute(params: dict) -> str:
    # Return a plain string on success.
    # Raise an exception to signal failure.
    return "result"
No classes, no module-level print(). The loader wraps execute in a Tool subclass automatically.

ToolResult (class-based format)

FieldTypeDescription
successboolWhether the tool succeeded
outputstrAlways a string — LLM sees this
errorstr | NoneIncluded in LLM output on failure
metadatadictInternal hints, e.g. is_image: True

Self-extension via write_tool

The agent can install new tools permanently at runtime. WriteToolTool validates, writes to tools/{name}.py, adds to tools/enabled.json, then hot-reloads. New tool is available from the next user message.

📡 WebSocket Protocol

Endpoint: ws://host/ws/sessions/{session_id}
Closes with code 4004 if session not found.

Client → Server

{
  "type": "message",         // required, always "message"
  "content": "user text",    // required, non-empty
  "images": ["base64..."],   // optional; data: URI prefix stripped server-side
  "files": [                 // optional; from POST /sessions/{id}/files
    {"name": "file.pdf", "path": "/abs/path/..."}
  ]
}

📬 Events Reference

TypeDirectionFieldsDescription
stream_start S→C Agent processing began. Block user input.
thinking_delta S→Cdelta Reasoning chunk (streaming). Accumulate until thinking_end.
thinking_end S→C Reasoning phase complete. Auto-collapsed in UI.
turn_thinking S→Cthinking, is_subagent Full reasoning block from tool-calling turn (non-streaming).
plan_ready S→Cplan Step-by-step plan before execution. Rendered as 🗺️ card.
tool_started S→Ctool, args, is_subagent Tool call began. Shows pending spinner in UI immediately.
tool_call S→Ctool, args, result, success, is_subagent Tool finished. Pairs with preceding tool_started.
stream_delta S→Cdelta Final response text chunk. Accumulate to build full content.
stream_end S→Ccontent, context_tokens, max_context_tokens Final response complete. Unlock user input.
stream_stopped S→C User stopped generation via POST /sessions/{id}/stop.
context_compressed S→Cmessages_before, messages_after Context compression ran after this turn.
profile_switched S→Cprofile_id, profile_name Active profile changed mid-stream by switch_profile tool.
error S→Cmessage Unhandled error. Some are recoverable, some terminate the stream.

🎬 Typical Event Sequences

Simple question (no tools)

stream_start
thinking_delta × N // if model reasons
thinking_end
stream_delta × N
stream_end

With planning + tools

stream_start
plan_ready // if planning_enabled
turn_thinking // reasoning before tool selection
tool_started
tool_call
tool_started
tool_call
thinking_delta × N
thinking_end
stream_delta × N
stream_end
context_compressed // optional, if threshold hit

Subagent (spawn_agent)

stream_start
tool_started spawn_agent is_subagent=false
turn_thinking is_subagent=true
tool_started web_search is_subagent=true
tool_call web_search is_subagent=true
tool_started filesystem is_subagent=true
tool_call filesystem is_subagent=true
tool_call spawn_agent is_subagent=false
stream_delta × N
stream_end

Profile switch

stream_start
tool_started switch_profile
profile_switched // update UI here
tool_call switch_profile
stream_delta × N
stream_end

🌐 REST API

MethodPathDescription
GET /health Health check → {"status":"ok"}
GET /agents/profiles List all available profiles
GET /agents/tools List all registered tools (builtin + user)
POST /sessions Create session → {session_id, profile_id, created_at}
GET /sessions List all sessions (sorted by pinned+last_active)
GET /sessions/{id} Full session with message history (display buffer)
GET /sessions/{id}/context LLM context (may differ from messages — for debugging)
PATCH /sessions/{id}/pin Pin or unpin a session
DEL /sessions/{id} Delete session and its uploaded files
POST /sessions/{id}/files Upload file (multipart/form-data). Max 200 MB. TTL 24h.
POST /sessions/{id}/messages Send message, wait for full response (non-streaming)
POST /sessions/{id}/stop Signal cooperative stop for running agent
WS /ws/sessions/{id} Streaming agent interface

👤 Profiles

Profiles define tools, system prompt, model, and behaviour per domain. Defined in navi/profiles/.

Profile IDNameModelTempPlanning
secretaryPersonal Secretary gemma4:31b-cloud 0.7 Yes
server_adminServer Administrator gemma4:31b-cloud 0.2 Yes
smart_homeSmart Home Assistant gemma4:31b-cloud 0.3 Yes

Per-profile scratchpad sections

ProfileSectionsDomain focus
secretaryfindings, sources, draftsResearch, writing, analysis
server_adminstatus, logs, errors, planRemote ops, monitoring
smart_homestate, config, errorsHome Assistant, IoT, automations

AgentProfile fields

FieldTypeDescription
idstrUnique identifier used in API and sessions
namestrHuman-readable name for UI
system_promptstrDomain-specific instructions (appended after persona)
enabled_toolslist[str]Tool names available to this profile
modelstrOllama model override (falls back to settings default)
temperaturefloatLLM temperature
max_iterationsintTool-calling loop limit (default 50)
planning_enabledboolRun planning phase before tool loop
llm_backendstrBackend key in BackendRegistry (default "ollama")

🧠 Memory System

Long-term user memory: facts extracted from conversations, stored in SQLite, injected into every session.

Database schema

TableKey columnsPurpose
memory_facts (category, key) unique Individual facts about the user — preferences, projects, environment
memory_summary Single row (id=1) Narrative summary generated from all facts; injected into every session
session_memory_state session_id, extracted_at Tracks which sessions have been processed for extraction

Automatic extraction trigger

POST /sessions (create new session) fires _process_stale_sessions() as a background task. Processes sessions idle > 30 minutes that haven't been extracted yet.

Memory injection

On every run_stream() / run() call, _memory_msg() fetches the summary and returns a system message: "## What I remember about the user\n\n{summary}". Injected after main system prompt, before conversation history.

Memory tools usage rules

Call memory_search when the user mentions something personal or before making assumptions about their environment. Do not call at session start reflexively — only when context warrants it. Call memory_forget only when explicitly asked.

⚙️ Configuration

All settings read from .env via pydantic-settings. Imported as from navi.config import settings.

LLM

VariableDefaultDescription
OLLAMA_HOSThttp://localhost:11434Ollama server URL
OLLAMA_DEFAULT_MODELgemma4:31b-cloudDefault model (overridable per profile)
OLLAMA_NUM_CTX65536Context window size in tokens
OLLAMA_THINKtrueEnable extended reasoning

Security / Sandboxing

VariableDefaultDescription
FS_ALLOWED_PATHS*Comma-separated paths filesystem tool can access. * = no limit
TERMINAL_ALLOWED_COMMANDS*Comma-separated allowed executables. * = allow all
SSH_HOSTS_FILEssh_hosts.jsonNamed SSH connections config

Persona

VariableDescription
NAVI_PERSONAInline global personality prompt
NAVI_PERSONA_FILEPath to .txt file with persona (recommended — inline doesn't parse multiline well)

Other

VariableDefaultDescription
DB_PATHnavi.dbSQLite file path
LOG_LEVELINFODEBUG / INFO / WARNING / ERROR
TOOLS_DIRtoolsUser tools directory
SESSION_FILES_DIRsession_filesUploaded files directory
SESSION_FILES_MAX_SIZE_MB200Max upload size per file
SESSION_FILES_TTL_HOURS24File retention hours