diff --git a/docs/visual.html b/docs/visual.html new file mode 100644 index 0000000..ce00e17 --- /dev/null +++ b/docs/visual.html @@ -0,0 +1,1362 @@ + + + + + +Navi — Architecture & Reference + + + + + + + + +
+ + +
+

🧭 Project Overview

+

Navi is a personal modular AI agent system. FastAPI backend + vanilla JS client. The agent is named Navi — female personal assistant. Runs locally via Ollama.

+ +
+
+
Entry point
+
navi/main.py
+
FastAPI app
+
+
+
Run command
+
uvicorn navi.main:app
+
--reload --port 8000
+
+
+
Default model
+
gemma4:e2b-it-q8_0
+
Ollama, 2B active params
+
+
+
Context window
+
65 536 tokens
+
OLLAMA_NUM_CTX
+
+
+
Database
+
SQLite
+
navi.db via aiosqlite
+
+
+
Thinking
+
Enabled
+
OLLAMA_THINK=true
+
+
+
+ + +
+

📦 Stack

+
+ + + + + + + + + + +
LayerTechnologyNotes
Web frameworkFastAPI + uvicornASGI, async throughout
LLM backend (primary)OllamaLocal, OllamaBackend in navi/llm/ollama.py
LLM backend (alt)OpenAI-compatiblenavi/llm/openai_backend.py
DatabaseaiosqliteSessions + memory facts in navi.db
Configpydantic-settingsReads .env, typed Settings object
LoggingstructlogStructured JSON-friendly logs
ClientVanilla JS ES modulesmarked.js + highlight.js via esm.sh CDN
Markdown renderingmarked.jsIn browser, assistant messages
+
+
+ + +
+

🗂️ Component Map

+ +
+
+
+
Client (browser)
+
+ WebSocket /ws/sessions/{id} + REST /sessions/* + REST /agents/* +
+
+
+
+
+
+
FastAPI — navi/main.py
+
+ api/websocket.py · _AgentRun · stop endpoint + routes/sessions.py + routes/agents.py + routes/messages.py +
+
+
+
+
+
+
Agent — navi/core/agent.py
+
+ run_stream() → AsyncGenerator[AgentEvent] + run() → str + run_ephemeral() → str (subagent) + _run_planning() + _run_workers() +
+
+
+
+
+
+
Registries — navi/core/registry.py · build_default_registries()
+
+ ToolRegistry + ProfileRegistry + BackendRegistry +
+
+
+
+
+
+
+
LLM Backend
+
+ OllamaBackend + complete() + stream_complete() +
+
+
+
SessionStore (SQLite)
+
+ messages[] + context[] +
+
+
+
MemoryStore (SQLite)
+
+ memory_facts + summary +
+
+
+
+
+
+ + +
+

🔄 Request Lifecycle

+

Streaming flow from WebSocket message to final response.

+
+
+
1
+
+ Client sends message + {type:"message", content:"...", images:[...]} over WebSocket +
+
+
+
2
+
+ websocket_session() creates _AgentRun + Subscribes a queue, launches _run_agent() as asyncio task, sends stream_start +
+
+
+
3
+
+ Pre-turn compression check + If context_token_count ≥ num_ctx × threshold → compress context before LLM call +
+
+
+
4
+
+ Planning phase + If profile.planning_enabled: fast non-streaming LLM call → yields plan_ready event if plan generated +
+
+
+
5
+
+ Tool-calling loop (max_iterations) + Calls llm.stream_complete() → yields thinking/text/tool events. Loops until finish_reason=stop +
+
+
+
6
+
+ StreamEnd + workers + Saves session to DB. Runs post-turn workers (compression). Yields context_compressed if triggered +
+
+
+
+
+ Done + Events broadcast from _AgentRun to all subscriber queues → sent as JSON to WebSocket +
+
+
+
+ + +
+

🔗 Context Vars

+

Thread-safe async-safe state shared between Agent and tools. Defined in navi/tools/base.py.

+
+ + + + + + + + + + + + + + + + + + + + +
ContextVarTypeSet byUsed by
current_session_idstr | NoneAgent before each runSSH pool, scratchpad, todo — per-session state
current_event_sinkQueue | Nonerun_stream() per tool taskrun_ephemeral() forwards sub-agent events to parent stream
current_stop_eventEvent | None_run_agent() before run_stream()Agent loop checks before each LLM call and mid-stream
+
+
+ Never use task.cancel() for stopping generation. It corrupts Starlette's WebSocket receive state. Use current_stop_event.set() via POST /sessions/{id}/stop. +
+
+ + +
+

⚙️ Agent Loop

+

Three entry points in navi/core/agent.py:

+
+ + + + + + + + + + + + + + + + + + + + +
MethodReturnsPersistencePlanning
run(session_id, msg)strSQLite sessionNo
run_stream(session_id, msg)AsyncGenerator[AgentEvent]SQLite sessionYes (if profile.planning_enabled)
run_ephemeral(msg, profile_id)strIn-memory onlyNo
+
+ +

System prompt construction

+

Built fresh on every LLM call — never stored in session.context.

+
NAVI_PERSONA (global personality)
+───────────────────────────────────────
+profile.system_prompt (domain rules)
+───────────────────────────────────────
+[memory injection: "## What I remember about the user"]
+───────────────────────────────────────
+session.context messages (history, no system msgs)
+ +

Sub-agent isolation

+

run_ephemeral() sets current_session_id = "subagent_<uuid12>" so each subagent has its own isolated scratchpad and SSH connection pool entry.

+
+ + +
+

🗺️ Planning Phase

+

Runs before the tool-calling loop when profile.planning_enabled = true.

+ +
+
+
1
+
+ LLM call: decide or plan + Fast non-streaming call: think=False, temperature=0.3, no tools +
+
+
+
2
+
+ Response classification + Starts with DIRECT → skip planning. No numbered steps found → skip. Otherwise → real plan. +
+
+
+
3
+
+ Plan injection + Appended to session.context as assistant message — model continues from it naturally +
+
+
+
4
+
+ PlanReady event emitted + Rendered as collapsible 🗺️ card in UI before execution begins +
+
+
+
+ + +
+

💾 Sessions

+ +

Session model (navi/core/session.py)

+
+ + + + + + + + +
FieldTypeDescription
idUUID strUnique session identifier
profile_idstrActive profile
messageslist[Message]Full history Never compressed. Used for UI display.
contextlist[Message]LLM context May be replaced by compression summary.
context_token_countintAccumulated tokens; reset to 0 after compression
pinnedboolPinned sessions appear first in sidebar
+
+ +

Dual-buffer design

+
+ Key invariant: session.messages is the full, unmodified conversation history — always available for display. session.context is what the LLM actually sees — may contain a compression summary instead of old messages. +
+ +

Message format

+
+ + + + + + + + + + +
FieldPresent onType
rolealluser | assistant | tool | system
contentmoststr | None
imagesuser, assistantlist[str] — base64
tool_callsassistant (when calling tools)list[ToolCallRequest]
tool_call_idtool resultsstr
nametool resultstool name
is_summarycompressed blocksbool
created_atuser/assistantISO 8601 datetime
+
+
+ + +
+

🗜️ Context Compression

+

Keeps the LLM context within the token budget. Only session.context is modified — session.messages is never touched.

+ +

Trigger points

+
+
+
Pre-turn
+
Before LLM call in run_stream()
+
Checks context_token_count against threshold
+
+
+
Post-turn (worker)
+
After StreamEnd via CompressionWorker
+
Re-checks and compresses if still needed
+
+
+ +

Algorithm

+
+
+
1
+
+ Partition into turns + Keep last context_keep_recent turns verbatim. Tool call groups never split. +
+
+
+
2
+
+ Format old turns as text + Tool args truncated to 120 chars, results to 300 chars. Total input capped at 12 000 chars. +
+
+
+
3
+
+ Summarize with LLM + think=False, bullet-point output. Same model — no model swap or extra loading. +
+
+
+
4
+
+ Replace with summary message + role=user, is_summary=True. Result: system_msgs + [summary] + recent_turns +
+
+
+ +

Config

+
+ + + + + + +
SettingDefaultDescription
CONTEXT_COMPRESSION_ENABLEDtrueEnable/disable
CONTEXT_COMPRESSION_THRESHOLD0.80Trigger at 80% of context window
CONTEXT_KEEP_RECENT10Turns kept verbatim
CONTEXT_SUMMARY_TEMPERATURE0.3Summarization temperature
+
+
+ + +
+

🔧 Built-in Tools

+

Registered in build_default_registries() as builtins. Never removed on hot-reload.

+
+ + + + + + + + + + + + + + + + + + + + +
NameClassDescription
web_searchWebSearchToolDuckDuckGo web search
web_viewWebViewToolFetch and render a URL as text
filesystemFilesystemToolRead/write/list local files (path allowlist via config)
http_requestHttpRequestToolGeneric HTTP client — GET/POST/PUT/etc.
code_execCodeExecToolExecute Python in a subprocess sandbox
terminalTerminalToolRun shell commands (command allowlist via config)
ssh_execSshExecToolSSH into remote hosts; connection pool keyed by session ID
image_viewImageViewToolLoad image from path/URL → base64 for multimodal LLM
todoTodoToolPer-session task checklist (set/update/read)
scratchpadScratchpadToolPer-session named working notes (write/append/read/clear)
reload_toolsReloadToolsToolHot-reload user tools without server restart
write_toolWriteToolToolWrite a new user tool file and reload immediately
list_toolsListToolsToolReturn the live tool list from registry
tool_manualToolManualToolReturn manuals/{name}.md or auto-generate from schema
memory_searchMemorySearchToolSearch long-term memory facts by keyword
memory_forgetMemoryForgetToolDelete a fact from long-term memory
spawn_agentSpawnAgentToolSpawn an isolated subagent (blocking, synchronous)
switch_profileSwitchProfileToolSwitch the active profile for the session
+
+
+ + +
+

🔌 User Tools

+ +
+
+

Discovery

+
    +
  • Loaded from tools/*.py at startup
  • +
  • Files starting with _ are ignored
  • +
  • tools/enabled.json — names to include in all profiles
  • +
  • Errors are isolated per file (one bad file ≠ failure)
  • +
  • Hot-reload via reload_tools or after write_tool
  • +
+
+
+

Current user tools

+
+
+
get_current_datetime
+
Returns current date/time
+
+
+
user_notes
+
Persistent personal notes store
+
+
+
+
+ +

Image tool → multimodal injection

+

When image_view succeeds, it returns metadata={is_image: true, base64: "..."}. The agent appends a synthetic user message with the image to session.context (not messages) — making it visible to the next LLM call without polluting display history.

+
+ + +
+

📝 Tool Format

+ +

Module-level format (preferred for user tools)

+
name = "my_tool"
+description = "What it does and when to use it — be specific."
+parameters = {
+    "type": "object",
+    "properties": {
+        "param": {"type": "string", "description": "..."}
+    },
+    "required": ["param"]
+}
+
+async def execute(params: dict) -> str:
+    # Return a plain string on success.
+    # Raise an exception to signal failure.
+    return "result"
+
No classes, no module-level print(). The loader wraps execute in a Tool subclass automatically.
+ +

ToolResult (class-based format)

+
+ + + + + + +
FieldTypeDescription
successboolWhether the tool succeeded
outputstrAlways a string — LLM sees this
errorstr | NoneIncluded in LLM output on failure
metadatadictInternal hints, e.g. is_image: True
+
+ +

Self-extension via write_tool

+

The agent can install new tools permanently at runtime. WriteToolTool validates, writes to tools/{name}.py, adds to tools/enabled.json, then hot-reloads. New tool is available from the next user message.

+
+ + +
+

📡 WebSocket Protocol

+ +

Endpoint: ws://host/ws/sessions/{session_id}
+ Closes with code 4004 if session not found.

+ +

Client → Server

+
{
+  "type": "message",         // required, always "message"
+  "content": "user text",    // required, non-empty
+  "images": ["base64..."],   // optional; data: URI prefix stripped server-side
+  "files": [                 // optional; from POST /sessions/{id}/files
+    {"name": "file.pdf", "path": "/abs/path/..."}
+  ]
+}
+
+ + +
+

📬 Events Reference

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TypeDirectionFieldsDescription
stream_startS→CAgent processing began. Block user input.
thinking_deltaS→CdeltaReasoning chunk (streaming). Accumulate until thinking_end.
thinking_endS→CReasoning phase complete. Auto-collapsed in UI.
turn_thinkingS→Cthinking, is_subagentFull reasoning block from tool-calling turn (non-streaming).
plan_readyS→CplanStep-by-step plan before execution. Rendered as 🗺️ card.
tool_startedS→Ctool, args, is_subagentTool call began. Shows pending spinner in UI immediately.
tool_callS→Ctool, args, result, success, is_subagentTool finished. Pairs with preceding tool_started.
stream_deltaS→CdeltaFinal response text chunk. Accumulate to build full content.
stream_endS→Ccontent, context_tokens, max_context_tokensFinal response complete. Unlock user input.
stream_stoppedS→CUser stopped generation via POST /sessions/{id}/stop.
context_compressedS→Cmessages_before, messages_afterContext compression ran after this turn.
profile_switchedS→Cprofile_id, profile_nameActive profile changed mid-stream by switch_profile tool.
errorS→CmessageUnhandled error. Some are recoverable, some terminate the stream.
+
+
+ + +
+

🎬 Typical Event Sequences

+ +

Simple question (no tools)

+
+
stream_start
+
thinking_delta × N // if model reasons
+
thinking_end
+
stream_delta × N
+
stream_end
+
+ +

With planning + tools

+
+
stream_start
+
plan_ready // if planning_enabled
+
turn_thinking // reasoning before tool selection
+
tool_started
+
tool_call
+
tool_started
+
tool_call
+
thinking_delta × N
+
thinking_end
+
stream_delta × N
+
stream_end
+
context_compressed // optional, if threshold hit
+
+ +

Subagent (spawn_agent)

+
+
stream_start
+
tool_started spawn_agent is_subagent=false
+
turn_thinking is_subagent=true
+
tool_started web_search is_subagent=true
+
tool_call web_search is_subagent=true
+
tool_started filesystem is_subagent=true
+
tool_call filesystem is_subagent=true
+
tool_call spawn_agent is_subagent=false
+
stream_delta × N
+
stream_end
+
+ +

Profile switch

+
+
stream_start
+
tool_started switch_profile
+
profile_switched // update UI here
+
tool_call switch_profile
+
stream_delta × N
+
stream_end
+
+
+ + +
+

🌐 REST API

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MethodPathDescription
GET/healthHealth check → {"status":"ok"}
GET/agents/profilesList all available profiles
GET/agents/toolsList all registered tools (builtin + user)
POST/sessionsCreate session → {session_id, profile_id, created_at}
GET/sessionsList all sessions (sorted by pinned+last_active)
GET/sessions/{id}Full session with message history (display buffer)
GET/sessions/{id}/contextLLM context (may differ from messages — for debugging)
PATCH/sessions/{id}/pinPin or unpin a session
DEL/sessions/{id}Delete session and its uploaded files
POST/sessions/{id}/filesUpload file (multipart/form-data). Max 200 MB. TTL 24h.
POST/sessions/{id}/messagesSend message, wait for full response (non-streaming)
POST/sessions/{id}/stopSignal cooperative stop for running agent
WS/ws/sessions/{id}Streaming agent interface
+
+
+ + +
+

👤 Profiles

+

Profiles define tools, system prompt, model, and behaviour per domain. Defined in navi/profiles/.

+ +
+ + + + + + + + + + + + + + + + + + + + +
Profile IDNameModelTempPlanning
secretaryPersonal Secretarygemma4:26b-a4b-it-q4_K_M0.7Yes
server_adminServer Administratorgemma4:26b-a4b-it-q4_K_M0.2Yes
smart_homeSmart Home Assistantgemma4:26b-a4b-it-q4_K_M0.3Yes
+
+ +

Per-profile scratchpad sections

+
+ + + + + +
ProfileSectionsDomain focus
secretaryfindings, sources, draftsResearch, writing, analysis
server_adminstatus, logs, errors, planRemote ops, monitoring
smart_homestate, config, errorsHome Assistant, IoT, automations
+
+ +

AgentProfile fields

+
+ + + + + + + + + + + +
FieldTypeDescription
idstrUnique identifier used in API and sessions
namestrHuman-readable name for UI
system_promptstrDomain-specific instructions (appended after persona)
enabled_toolslist[str]Tool names available to this profile
modelstrOllama model override (falls back to settings default)
temperaturefloatLLM temperature
max_iterationsintTool-calling loop limit (default 50)
planning_enabledboolRun planning phase before tool loop
llm_backendstrBackend key in BackendRegistry (default "ollama")
+
+
+ + +
+

🧠 Memory System

+

Long-term user memory: facts extracted from conversations, stored in SQLite, injected into every session.

+ +

Database schema

+
+ + + + + + + + + + + + + + + + + +
TableKey columnsPurpose
memory_facts(category, key) uniqueIndividual facts about the user — preferences, projects, environment
memory_summarySingle row (id=1)Narrative summary generated from all facts; injected into every session
session_memory_statesession_id, extracted_atTracks which sessions have been processed for extraction
+
+ +

Automatic extraction trigger

+

POST /sessions (create new session) fires _process_stale_sessions() as a background task. Processes sessions idle > 30 minutes that haven't been extracted yet.

+ +

Memory injection

+

On every run_stream() / run() call, _memory_msg() fetches the summary and returns a system message: "## What I remember about the user\n\n{summary}". Injected after main system prompt, before conversation history.

+ +

Memory tools usage rules

+
+ Call memory_search when the user mentions something personal or before making assumptions about their environment. Do not call at session start reflexively — only when context warrants it. Call memory_forget only when explicitly asked. +
+
+ + +
+

⚙️ Configuration

+

All settings read from .env via pydantic-settings. Imported as from navi.config import settings.

+ +

LLM

+
+ + + + + + +
VariableDefaultDescription
OLLAMA_HOSThttp://localhost:11434Ollama server URL
OLLAMA_DEFAULT_MODELgemma4:e2b-it-q8_0Default model (overridable per profile)
OLLAMA_NUM_CTX65536Context window size in tokens
OLLAMA_THINKtrueEnable extended reasoning
+
+ +

Security / Sandboxing

+
+ + + + + +
VariableDefaultDescription
FS_ALLOWED_PATHS*Comma-separated paths filesystem tool can access. * = no limit
TERMINAL_ALLOWED_COMMANDS*Comma-separated allowed executables. * = allow all
SSH_HOSTS_FILEssh_hosts.jsonNamed SSH connections config
+
+ +

Persona

+
+ + + + +
VariableDescription
NAVI_PERSONAInline global personality prompt
NAVI_PERSONA_FILEPath to .txt file with persona (recommended — inline doesn't parse multiline well)
+
+ +

Other

+
+ + + + + + + + +
VariableDefaultDescription
DB_PATHnavi.dbSQLite file path
LOG_LEVELINFODEBUG / INFO / WARNING / ERROR
TOOLS_DIRtoolsUser tools directory
SESSION_FILES_DIRsession_filesUploaded files directory
SESSION_FILES_MAX_SIZE_MB200Max upload size per file
SESSION_FILES_TTL_HOURS24File retention hours
+
+
+ +
+ + + +