diff --git a/docs/visual.html b/docs/visual.html
new file mode 100644
index 0000000..ce00e17
--- /dev/null
+++ b/docs/visual.html
@@ -0,0 +1,1362 @@
+
+
+
+
+
+
+
+ 🧭 Project Overview
+ Navi is a personal modular AI agent system. FastAPI backend + vanilla JS client. The agent is named Navi — female personal assistant. Runs locally via Ollama.
+
+
+
+
Entry point
+
navi/main.py
+
FastAPI app
+
+
+
Run command
+
uvicorn navi.main:app
+
--reload --port 8000
+
+
+
Default model
+
gemma4:e2b-it-q8_0
+
Ollama, 2B active params
+
+
+
Context window
+
65 536 tokens
+
OLLAMA_NUM_CTX
+
+
+
Database
+
SQLite
+
navi.db via aiosqlite
+
+
+
Thinking
+
Enabled
+
OLLAMA_THINK=true
+
+
+
+
+
+
+ 📦 Stack
+
+
+ | Layer | Technology | Notes |
+ | Web framework | FastAPI + uvicorn | ASGI, async throughout |
+ | LLM backend (primary) | Ollama | Local, OllamaBackend in navi/llm/ollama.py |
+ | LLM backend (alt) | OpenAI-compatible | navi/llm/openai_backend.py |
+ | Database | aiosqlite | Sessions + memory facts in navi.db |
+ | Config | pydantic-settings | Reads .env, typed Settings object |
+ | Logging | structlog | Structured JSON-friendly logs |
+ | Client | Vanilla JS ES modules | marked.js + highlight.js via esm.sh CDN |
+ | Markdown rendering | marked.js | In browser, assistant messages |
+
+
+
+
+
+
+ 🗂️ Component Map
+
+
+
+
+
Client (browser)
+
+ WebSocket /ws/sessions/{id}
+ REST /sessions/*
+ REST /agents/*
+
+
+
+
↓
+
+
+
FastAPI — navi/main.py
+
+ api/websocket.py · _AgentRun · stop endpoint
+ routes/sessions.py
+ routes/agents.py
+ routes/messages.py
+
+
+
+
↓
+
+
+
Agent — navi/core/agent.py
+
+ run_stream() → AsyncGenerator[AgentEvent]
+ run() → str
+ run_ephemeral() → str (subagent)
+ _run_planning()
+ _run_workers()
+
+
+
+
↓
+
+
+
Registries — navi/core/registry.py · build_default_registries()
+
+ ToolRegistry
+ ProfileRegistry
+ BackendRegistry
+
+
+
+
↓
+
+
+
+
LLM Backend
+
+ OllamaBackend
+ complete()
+ stream_complete()
+
+
+
+
SessionStore (SQLite)
+
+ messages[]
+ context[]
+
+
+
+
MemoryStore (SQLite)
+
+ memory_facts
+ summary
+
+
+
+
+
+
+
+
+
+ 🔄 Request Lifecycle
+ Streaming flow from WebSocket message to final response.
+
+
+
1
+
+ Client sends message
+ {type:"message", content:"...", images:[...]} over WebSocket
+
+
+
+
2
+
+ websocket_session() creates _AgentRun
+ Subscribes a queue, launches _run_agent() as asyncio task, sends stream_start
+
+
+
+
3
+
+ Pre-turn compression check
+ If context_token_count ≥ num_ctx × threshold → compress context before LLM call
+
+
+
+
4
+
+ Planning phase
+ If profile.planning_enabled: fast non-streaming LLM call → yields plan_ready event if plan generated
+
+
+
+
5
+
+ Tool-calling loop (max_iterations)
+ Calls llm.stream_complete() → yields thinking/text/tool events. Loops until finish_reason=stop
+
+
+
+
6
+
+ StreamEnd + workers
+ Saves session to DB. Runs post-turn workers (compression). Yields context_compressed if triggered
+
+
+
+
✓
+
+ Done
+ Events broadcast from _AgentRun to all subscriber queues → sent as JSON to WebSocket
+
+
+
+
+
+
+
+ 🔗 Context Vars
+ Thread-safe async-safe state shared between Agent and tools. Defined in navi/tools/base.py.
+
+
+ | ContextVar | Type | Set by | Used by |
+
+ | current_session_id |
+ str | None |
+ Agent before each run |
+ SSH pool, scratchpad, todo — per-session state |
+
+
+ | current_event_sink |
+ Queue | None |
+ run_stream() per tool task |
+ run_ephemeral() forwards sub-agent events to parent stream |
+
+
+ | current_stop_event |
+ Event | None |
+ _run_agent() before run_stream() |
+ Agent loop checks before each LLM call and mid-stream |
+
+
+
+
+ Never use task.cancel() for stopping generation. It corrupts Starlette's WebSocket receive state. Use current_stop_event.set() via POST /sessions/{id}/stop.
+
+
+
+
+
+ ⚙️ Agent Loop
+ Three entry points in navi/core/agent.py:
+
+
+ | Method | Returns | Persistence | Planning |
+
+ run(session_id, msg) |
+ str |
+ SQLite session |
+ No |
+
+
+ run_stream(session_id, msg) |
+ AsyncGenerator[AgentEvent] |
+ SQLite session |
+ Yes (if profile.planning_enabled) |
+
+
+ run_ephemeral(msg, profile_id) |
+ str |
+ In-memory only |
+ No |
+
+
+
+
+ System prompt construction
+ Built fresh on every LLM call — never stored in session.context.
+ NAVI_PERSONA (global personality)
+───────────────────────────────────────
+profile.system_prompt (domain rules)
+───────────────────────────────────────
+[memory injection: "## What I remember about the user"]
+───────────────────────────────────────
+session.context messages (history, no system msgs)
+
+ Sub-agent isolation
+ run_ephemeral() sets current_session_id = "subagent_<uuid12>" so each subagent has its own isolated scratchpad and SSH connection pool entry.
+
+
+
+
+ 🗺️ Planning Phase
+ Runs before the tool-calling loop when profile.planning_enabled = true.
+
+
+
+
1
+
+ LLM call: decide or plan
+ Fast non-streaming call: think=False, temperature=0.3, no tools
+
+
+
+
2
+
+ Response classification
+ Starts with DIRECT → skip planning. No numbered steps found → skip. Otherwise → real plan.
+
+
+
+
3
+
+ Plan injection
+ Appended to session.context as assistant message — model continues from it naturally
+
+
+
+
4
+
+ PlanReady event emitted
+ Rendered as collapsible 🗺️ card in UI before execution begins
+
+
+
+
+
+
+
+ 💾 Sessions
+
+ Session model (navi/core/session.py)
+
+
+ | Field | Type | Description |
+ id | UUID str | Unique session identifier |
+ profile_id | str | Active profile |
+ messages | list[Message] | Full history Never compressed. Used for UI display. |
+ context | list[Message] | LLM context May be replaced by compression summary. |
+ context_token_count | int | Accumulated tokens; reset to 0 after compression |
+ pinned | bool | Pinned sessions appear first in sidebar |
+
+
+
+ Dual-buffer design
+
+ Key invariant: session.messages is the full, unmodified conversation history — always available for display. session.context is what the LLM actually sees — may contain a compression summary instead of old messages.
+
+
+ Message format
+
+
+ | Field | Present on | Type |
+ role | all | user | assistant | tool | system |
+ content | most | str | None |
+ images | user, assistant | list[str] — base64 |
+ tool_calls | assistant (when calling tools) | list[ToolCallRequest] |
+ tool_call_id | tool results | str |
+ name | tool results | tool name |
+ is_summary | compressed blocks | bool |
+ created_at | user/assistant | ISO 8601 datetime |
+
+
+
+
+
+
+ 🗜️ Context Compression
+ Keeps the LLM context within the token budget. Only session.context is modified — session.messages is never touched.
+
+ Trigger points
+
+
+
Pre-turn
+
Before LLM call in run_stream()
+
Checks context_token_count against threshold
+
+
+
Post-turn (worker)
+
After StreamEnd via CompressionWorker
+
Re-checks and compresses if still needed
+
+
+
+ Algorithm
+
+
+
1
+
+ Partition into turns
+ Keep last context_keep_recent turns verbatim. Tool call groups never split.
+
+
+
+
2
+
+ Format old turns as text
+ Tool args truncated to 120 chars, results to 300 chars. Total input capped at 12 000 chars.
+
+
+
+
3
+
+ Summarize with LLM
+ think=False, bullet-point output. Same model — no model swap or extra loading.
+
+
+
+
4
+
+ Replace with summary message
+ role=user, is_summary=True. Result: system_msgs + [summary] + recent_turns
+
+
+
+
+ Config
+
+
+ | Setting | Default | Description |
+ CONTEXT_COMPRESSION_ENABLED | true | Enable/disable |
+ CONTEXT_COMPRESSION_THRESHOLD | 0.80 | Trigger at 80% of context window |
+ CONTEXT_KEEP_RECENT | 10 | Turns kept verbatim |
+ CONTEXT_SUMMARY_TEMPERATURE | 0.3 | Summarization temperature |
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 📡 WebSocket Protocol
+
+ Endpoint: ws://host/ws/sessions/{session_id}
+ Closes with code 4004 if session not found.
+
+ Client → Server
+ {
+ "type": "message", // required, always "message"
+ "content": "user text", // required, non-empty
+ "images": ["base64..."], // optional; data: URI prefix stripped server-side
+ "files": [ // optional; from POST /sessions/{id}/files
+ {"name": "file.pdf", "path": "/abs/path/..."}
+ ]
+}
+
+
+
+
+ 📬 Events Reference
+
+
+ | Type | Direction | Fields | Description |
+
+ | stream_start |
+ S→C | — |
+ Agent processing began. Block user input. |
+
+
+ | thinking_delta |
+ S→C | delta |
+ Reasoning chunk (streaming). Accumulate until thinking_end. |
+
+
+ | thinking_end |
+ S→C | — |
+ Reasoning phase complete. Auto-collapsed in UI. |
+
+
+ | turn_thinking |
+ S→C | thinking, is_subagent |
+ Full reasoning block from tool-calling turn (non-streaming). |
+
+
+ | plan_ready |
+ S→C | plan |
+ Step-by-step plan before execution. Rendered as 🗺️ card. |
+
+
+ | tool_started |
+ S→C | tool, args, is_subagent |
+ Tool call began. Shows pending spinner in UI immediately. |
+
+
+ | tool_call |
+ S→C | tool, args, result, success, is_subagent |
+ Tool finished. Pairs with preceding tool_started. |
+
+
+ | stream_delta |
+ S→C | delta |
+ Final response text chunk. Accumulate to build full content. |
+
+
+ | stream_end |
+ S→C | content, context_tokens, max_context_tokens |
+ Final response complete. Unlock user input. |
+
+
+ | stream_stopped |
+ S→C | — |
+ User stopped generation via POST /sessions/{id}/stop. |
+
+
+ | context_compressed |
+ S→C | messages_before, messages_after |
+ Context compression ran after this turn. |
+
+
+ | profile_switched |
+ S→C | profile_id, profile_name |
+ Active profile changed mid-stream by switch_profile tool. |
+
+
+ | error |
+ S→C | message |
+ Unhandled error. Some are recoverable, some terminate the stream. |
+
+
+
+
+
+
+
+ 🎬 Typical Event Sequences
+
+ Simple question (no tools)
+
+
stream_start
+
thinking_delta × N // if model reasons
+
thinking_end
+
stream_delta × N
+
stream_end
+
+
+ With planning + tools
+
+
stream_start
+
plan_ready // if planning_enabled
+
turn_thinking // reasoning before tool selection
+
tool_started
+
tool_call
+
tool_started
+
tool_call
+
thinking_delta × N
+
thinking_end
+
stream_delta × N
+
stream_end
+
context_compressed // optional, if threshold hit
+
+
+ Subagent (spawn_agent)
+
+
stream_start
+
tool_started spawn_agent is_subagent=false
+
turn_thinking is_subagent=true
+
tool_started web_search is_subagent=true
+
tool_call web_search is_subagent=true
+
tool_started filesystem is_subagent=true
+
tool_call filesystem is_subagent=true
+
tool_call spawn_agent is_subagent=false
+
stream_delta × N
+
stream_end
+
+
+ Profile switch
+
+
stream_start
+
tool_started switch_profile
+
profile_switched // update UI here
+
tool_call switch_profile
+
stream_delta × N
+
stream_end
+
+
+
+
+
+ 🌐 REST API
+
+
+ | Method | Path | Description |
+
+ | GET |
+ /health |
+ Health check → {"status":"ok"} |
+
+
+ | GET |
+ /agents/profiles |
+ List all available profiles |
+
+
+ | GET |
+ /agents/tools |
+ List all registered tools (builtin + user) |
+
+
+ | POST |
+ /sessions |
+ Create session → {session_id, profile_id, created_at} |
+
+
+ | GET |
+ /sessions |
+ List all sessions (sorted by pinned+last_active) |
+
+
+ | GET |
+ /sessions/{id} |
+ Full session with message history (display buffer) |
+
+
+ | GET |
+ /sessions/{id}/context |
+ LLM context (may differ from messages — for debugging) |
+
+
+ | PATCH |
+ /sessions/{id}/pin |
+ Pin or unpin a session |
+
+
+ | DEL |
+ /sessions/{id} |
+ Delete session and its uploaded files |
+
+
+ | POST |
+ /sessions/{id}/files |
+ Upload file (multipart/form-data). Max 200 MB. TTL 24h. |
+
+
+ | POST |
+ /sessions/{id}/messages |
+ Send message, wait for full response (non-streaming) |
+
+
+ | POST |
+ /sessions/{id}/stop |
+ Signal cooperative stop for running agent |
+
+
+ | WS |
+ /ws/sessions/{id} |
+ Streaming agent interface |
+
+
+
+
+
+
+
+ 👤 Profiles
+ Profiles define tools, system prompt, model, and behaviour per domain. Defined in navi/profiles/.
+
+
+
+ | Profile ID | Name | Model | Temp | Planning |
+
+ secretary | Personal Secretary |
+ gemma4:26b-a4b-it-q4_K_M |
+ 0.7 |
+ Yes |
+
+
+ server_admin | Server Administrator |
+ gemma4:26b-a4b-it-q4_K_M |
+ 0.2 |
+ Yes |
+
+
+ smart_home | Smart Home Assistant |
+ gemma4:26b-a4b-it-q4_K_M |
+ 0.3 |
+ Yes |
+
+
+
+
+ Per-profile scratchpad sections
+
+
+ | Profile | Sections | Domain focus |
+ secretary | findings, sources, drafts | Research, writing, analysis |
+ server_admin | status, logs, errors, plan | Remote ops, monitoring |
+ smart_home | state, config, errors | Home Assistant, IoT, automations |
+
+
+
+ AgentProfile fields
+
+
+ | Field | Type | Description |
+ id | str | Unique identifier used in API and sessions |
+ name | str | Human-readable name for UI |
+ system_prompt | str | Domain-specific instructions (appended after persona) |
+ enabled_tools | list[str] | Tool names available to this profile |
+ model | str | Ollama model override (falls back to settings default) |
+ temperature | float | LLM temperature |
+ max_iterations | int | Tool-calling loop limit (default 50) |
+ planning_enabled | bool | Run planning phase before tool loop |
+ llm_backend | str | Backend key in BackendRegistry (default "ollama") |
+
+
+
+
+
+
+ 🧠 Memory System
+ Long-term user memory: facts extracted from conversations, stored in SQLite, injected into every session.
+
+ Database schema
+
+
+ | Table | Key columns | Purpose |
+
+ memory_facts |
+ (category, key) unique |
+ Individual facts about the user — preferences, projects, environment |
+
+
+ memory_summary |
+ Single row (id=1) |
+ Narrative summary generated from all facts; injected into every session |
+
+
+ session_memory_state |
+ session_id, extracted_at |
+ Tracks which sessions have been processed for extraction |
+
+
+
+
+ Automatic extraction trigger
+ POST /sessions (create new session) fires _process_stale_sessions() as a background task. Processes sessions idle > 30 minutes that haven't been extracted yet.
+
+ Memory injection
+ On every run_stream() / run() call, _memory_msg() fetches the summary and returns a system message: "## What I remember about the user\n\n{summary}". Injected after main system prompt, before conversation history.
+
+ Memory tools usage rules
+
+ Call memory_search when the user mentions something personal or before making assumptions about their environment. Do not call at session start reflexively — only when context warrants it. Call memory_forget only when explicitly asked.
+
+
+
+
+
+ ⚙️ Configuration
+ All settings read from .env via pydantic-settings. Imported as from navi.config import settings.
+
+ LLM
+
+
+ | Variable | Default | Description |
+ OLLAMA_HOST | http://localhost:11434 | Ollama server URL |
+ OLLAMA_DEFAULT_MODEL | gemma4:e2b-it-q8_0 | Default model (overridable per profile) |
+ OLLAMA_NUM_CTX | 65536 | Context window size in tokens |
+ OLLAMA_THINK | true | Enable extended reasoning |
+
+
+
+ Security / Sandboxing
+
+
+ | Variable | Default | Description |
+ FS_ALLOWED_PATHS | * | Comma-separated paths filesystem tool can access. * = no limit |
+ TERMINAL_ALLOWED_COMMANDS | * | Comma-separated allowed executables. * = allow all |
+ SSH_HOSTS_FILE | ssh_hosts.json | Named SSH connections config |
+
+
+
+ Persona
+
+
+ | Variable | Description |
+ NAVI_PERSONA | Inline global personality prompt |
+ NAVI_PERSONA_FILE | Path to .txt file with persona (recommended — inline doesn't parse multiline well) |
+
+
+
+ Other
+
+
+ | Variable | Default | Description |
+ DB_PATH | navi.db | SQLite file path |
+ LOG_LEVEL | INFO | DEBUG / INFO / WARNING / ERROR |
+ TOOLS_DIR | tools | User tools directory |
+ SESSION_FILES_DIR | session_files | Uploaded files directory |
+ SESSION_FILES_MAX_SIZE_MB | 200 | Max upload size per file |
+ SESSION_FILES_TTL_HOURS | 24 | File retention hours |
+
+
+
+
+
+
+
+
+