🧭 Project Overview
Navi is a personal modular AI agent system. FastAPI backend + vanilla JS client. The agent is named Navi — female personal assistant. Runs locally via Ollama.
Entry point
navi/main.py
FastAPI app
Run command
uvicorn navi.main:app
--reload --port 8000
Default model
gemma4:31b-cloud
Ollama, 2B active params
Context window
65 536 tokens
OLLAMA_NUM_CTX
Database
SQLite
navi.db via aiosqlite
Thinking
Enabled
OLLAMA_THINK=true
📦 Stack
| Layer | Technology | Notes |
| Web framework | FastAPI + uvicorn | ASGI, async throughout |
| LLM backend (primary) | Ollama | Local, OllamaBackend in navi/llm/ollama.py |
| LLM backend (alt) | OpenAI-compatible | navi/llm/openai_backend.py |
| Database | aiosqlite | Sessions + memory facts in navi.db |
| Config | pydantic-settings | Reads .env, typed Settings object |
| Logging | structlog | Structured JSON-friendly logs |
| Client | Vanilla JS ES modules | marked.js + highlight.js via esm.sh CDN |
| Markdown rendering | marked.js | In browser, assistant messages |
🗂️ Component Map
Client (browser)
WebSocket /ws/sessions/{id}
REST /sessions/*
REST /agents/*
↓
FastAPI — navi/main.py
api/websocket.py · _AgentRun · stop endpoint
routes/sessions.py
routes/agents.py
routes/messages.py
↓
Agent — navi/core/agent.py
run_stream() → AsyncGenerator[AgentEvent]
run() → str
run_ephemeral() → str (subagent)
_run_planning()
_run_workers()
↓
Registries — navi/core/registry.py · build_default_registries()
ToolRegistry
ProfileRegistry
BackendRegistry
↓
LLM Backend
OllamaBackend
complete()
stream_complete()
SessionStore (SQLite)
messages[]
context[]
MemoryStore (SQLite)
memory_facts
summary
🔄 Request Lifecycle
Streaming flow from WebSocket message to final response.
1
Client sends message
{type:"message", content:"...", images:[...]} over WebSocket
2
websocket_session() creates _AgentRun
Subscribes a queue, launches _run_agent() as asyncio task, sends stream_start
3
Pre-turn compression check
If context_token_count ≥ num_ctx × threshold → compress context before LLM call
4
Planning phase
If profile.planning_enabled: fast non-streaming LLM call → yields plan_ready event if plan generated
5
Tool-calling loop (max_iterations)
Calls llm.stream_complete() → yields thinking/text/tool events. Loops until finish_reason=stop
6
StreamEnd + workers
Saves session to DB. Runs post-turn workers (compression). Yields context_compressed if triggered
✓
Done
Events broadcast from _AgentRun to all subscriber queues → sent as JSON to WebSocket
🔗 Context Vars
Thread-safe async-safe state shared between Agent and tools. Defined in navi/tools/base.py.
| ContextVar | Type | Set by | Used by |
| current_session_id |
str | None |
Agent before each run |
SSH pool, scratchpad, todo — per-session state |
| current_event_sink |
Queue | None |
run_stream() per tool task |
run_ephemeral() forwards sub-agent events to parent stream |
| current_stop_event |
Event | None |
_run_agent() before run_stream() |
Agent loop checks before each LLM call and mid-stream |
Never use task.cancel() for stopping generation. It corrupts Starlette's WebSocket receive state. Use current_stop_event.set() via POST /sessions/{id}/stop.
⚙️ Agent Loop
Three entry points in navi/core/agent.py:
| Method | Returns | Persistence | Planning |
run(session_id, msg) |
str |
SQLite session |
No |
run_stream(session_id, msg) |
AsyncGenerator[AgentEvent] |
SQLite session |
Yes (if profile.planning_enabled) |
run_ephemeral(msg, profile_id) |
str |
In-memory only |
No |
System prompt construction
Built fresh on every LLM call — never stored in session.context.
NAVI_PERSONA (global personality)
───────────────────────────────────────
profile.system_prompt (domain rules)
───────────────────────────────────────
[memory injection: "## What I remember about the user"]
───────────────────────────────────────
session.context messages (history, no system msgs)
Sub-agent isolation
run_ephemeral() sets current_session_id = "subagent_<uuid12>" so each subagent has its own isolated scratchpad and SSH connection pool entry.
🗺️ Planning Phase
Runs before the tool-calling loop when profile.planning_enabled = true.
1
LLM call: decide or plan
Fast non-streaming call: think=False, temperature=0.3, no tools
2
Response classification
Starts with DIRECT → skip planning. No numbered steps found → skip. Otherwise → real plan.
3
Plan injection
Appended to session.context as assistant message — model continues from it naturally
4
PlanReady event emitted
Rendered as collapsible 🗺️ card in UI before execution begins
💾 Sessions
Session model (navi/core/session.py)
| Field | Type | Description |
id | UUID str | Unique session identifier |
profile_id | str | Active profile |
messages | list[Message] | Full history Never compressed. Used for UI display. |
context | list[Message] | LLM context May be replaced by compression summary. |
context_token_count | int | Accumulated tokens; reset to 0 after compression |
pinned | bool | Pinned sessions appear first in sidebar |
Dual-buffer design
Key invariant: session.messages is the full, unmodified conversation history — always available for display. session.context is what the LLM actually sees — may contain a compression summary instead of old messages.
Message format
| Field | Present on | Type |
role | all | user | assistant | tool | system |
content | most | str | None |
images | user, assistant | list[str] — base64 |
tool_calls | assistant (when calling tools) | list[ToolCallRequest] |
tool_call_id | tool results | str |
name | tool results | tool name |
is_summary | compressed blocks | bool |
created_at | user/assistant | ISO 8601 datetime |
🗜️ Context Compression
Keeps the LLM context within the token budget. Only session.context is modified — session.messages is never touched.
Trigger points
Pre-turn
Before LLM call in run_stream()
Checks context_token_count against threshold
Post-turn (worker)
After StreamEnd via CompressionWorker
Re-checks and compresses if still needed
Algorithm
1
Partition into turns
Keep last context_keep_recent turns verbatim. Tool call groups never split.
2
Format old turns as text
Tool args truncated to 120 chars, results to 300 chars. Total input capped at 12 000 chars.
3
Summarize with LLM
think=False, bullet-point output. Same model — no model swap or extra loading.
4
Replace with summary message
role=user, is_summary=True. Result: system_msgs + [summary] + recent_turns
Config
| Setting | Default | Description |
CONTEXT_COMPRESSION_ENABLED | true | Enable/disable |
CONTEXT_COMPRESSION_THRESHOLD | 0.80 | Trigger at 80% of context window |
CONTEXT_KEEP_RECENT | 10 | Turns kept verbatim |
CONTEXT_SUMMARY_TEMPERATURE | 0.3 | Summarization temperature |
📡 WebSocket Protocol
Endpoint: ws://host/ws/sessions/{session_id}
Closes with code 4004 if session not found.
Client → Server
{
"type": "message", // required, always "message"
"content": "user text", // required, non-empty
"images": ["base64..."], // optional; data: URI prefix stripped server-side
"files": [ // optional; from POST /sessions/{id}/files
{"name": "file.pdf", "path": "/abs/path/..."}
]
}
📬 Events Reference
| Type | Direction | Fields | Description |
| stream_start |
S→C | — |
Agent processing began. Block user input. |
| thinking_delta |
S→C | delta |
Reasoning chunk (streaming). Accumulate until thinking_end. |
| thinking_end |
S→C | — |
Reasoning phase complete. Auto-collapsed in UI. |
| turn_thinking |
S→C | thinking, is_subagent |
Full reasoning block from tool-calling turn (non-streaming). |
| plan_ready |
S→C | plan |
Step-by-step plan before execution. Rendered as 🗺️ card. |
| tool_started |
S→C | tool, args, is_subagent |
Tool call began. Shows pending spinner in UI immediately. |
| tool_call |
S→C | tool, args, result, success, is_subagent |
Tool finished. Pairs with preceding tool_started. |
| stream_delta |
S→C | delta |
Final response text chunk. Accumulate to build full content. |
| stream_end |
S→C | content, context_tokens, max_context_tokens |
Final response complete. Unlock user input. |
| stream_stopped |
S→C | — |
User stopped generation via POST /sessions/{id}/stop. |
| context_compressed |
S→C | messages_before, messages_after |
Context compression ran after this turn. |
| profile_switched |
S→C | profile_id, profile_name |
Active profile changed mid-stream by switch_profile tool. |
| error |
S→C | message |
Unhandled error. Some are recoverable, some terminate the stream. |
🎬 Typical Event Sequences
Simple question (no tools)
stream_start
thinking_delta × N // if model reasons
thinking_end
stream_delta × N
stream_end
With planning + tools
stream_start
plan_ready // if planning_enabled
turn_thinking // reasoning before tool selection
tool_started
tool_call
tool_started
tool_call
thinking_delta × N
thinking_end
stream_delta × N
stream_end
context_compressed // optional, if threshold hit
Subagent (spawn_agent)
stream_start
tool_started spawn_agent is_subagent=false
turn_thinking is_subagent=true
tool_started web_search is_subagent=true
tool_call web_search is_subagent=true
tool_started filesystem is_subagent=true
tool_call filesystem is_subagent=true
tool_call spawn_agent is_subagent=false
stream_delta × N
stream_end
Profile switch
stream_start
tool_started switch_profile
profile_switched // update UI here
tool_call switch_profile
stream_delta × N
stream_end
🌐 REST API
| Method | Path | Description |
| GET |
/health |
Health check → {"status":"ok"} |
| GET |
/agents/profiles |
List all available profiles |
| GET |
/agents/tools |
List all registered tools (builtin + user) |
| POST |
/sessions |
Create session → {session_id, profile_id, created_at} |
| GET |
/sessions |
List all sessions (sorted by pinned+last_active) |
| GET |
/sessions/{id} |
Full session with message history (display buffer) |
| GET |
/sessions/{id}/context |
LLM context (may differ from messages — for debugging) |
| PATCH |
/sessions/{id}/pin |
Pin or unpin a session |
| DEL |
/sessions/{id} |
Delete session and its uploaded files |
| POST |
/sessions/{id}/files |
Upload file (multipart/form-data). Max 200 MB. TTL 24h. |
| POST |
/sessions/{id}/messages |
Send message, wait for full response (non-streaming) |
| POST |
/sessions/{id}/stop |
Signal cooperative stop for running agent |
| WS |
/ws/sessions/{id} |
Streaming agent interface |
👤 Profiles
Profiles define tools, system prompt, model, and behaviour per domain. Defined in navi/profiles/.
| Profile ID | Name | Model | Temp | Planning |
secretary | Personal Secretary |
gemma4:31b-cloud |
0.7 |
Yes |
server_admin | Server Administrator |
gemma4:31b-cloud |
0.2 |
Yes |
smart_home | Smart Home Assistant |
gemma4:31b-cloud |
0.3 |
Yes |
Per-profile scratchpad sections
| Profile | Sections | Domain focus |
secretary | findings, sources, drafts | Research, writing, analysis |
server_admin | status, logs, errors, plan | Remote ops, monitoring |
smart_home | state, config, errors | Home Assistant, IoT, automations |
AgentProfile fields
| Field | Type | Description |
id | str | Unique identifier used in API and sessions |
name | str | Human-readable name for UI |
system_prompt | str | Domain-specific instructions (appended after persona) |
enabled_tools | list[str] | Tool names available to this profile |
model | str | Ollama model override (falls back to settings default) |
temperature | float | LLM temperature |
max_iterations | int | Tool-calling loop limit (default 50) |
planning_enabled | bool | Run planning phase before tool loop |
llm_backend | str | Backend key in BackendRegistry (default "ollama") |
🧠 Memory System
Long-term user memory: facts extracted from conversations, stored in SQLite, injected into every session.
Database schema
| Table | Key columns | Purpose |
memory_facts |
(category, key) unique |
Individual facts about the user — preferences, projects, environment |
memory_summary |
Single row (id=1) |
Narrative summary generated from all facts; injected into every session |
session_memory_state |
session_id, extracted_at |
Tracks which sessions have been processed for extraction |
Automatic extraction trigger
POST /sessions (create new session) fires _process_stale_sessions() as a background task. Processes sessions idle > 30 minutes that haven't been extracted yet.
Memory injection
On every run_stream() / run() call, _memory_msg() fetches the summary and returns a system message: "## What I remember about the user\n\n{summary}". Injected after main system prompt, before conversation history.
Memory tools usage rules
Call memory_search when the user mentions something personal or before making assumptions about their environment. Do not call at session start reflexively — only when context warrants it. Call memory_forget only when explicitly asked.
⚙️ Configuration
All settings read from .env via pydantic-settings. Imported as from navi.config import settings.
LLM
| Variable | Default | Description |
OLLAMA_HOST | http://localhost:11434 | Ollama server URL |
OLLAMA_DEFAULT_MODEL | gemma4:31b-cloud | Default model (overridable per profile) |
OLLAMA_NUM_CTX | 65536 | Context window size in tokens |
OLLAMA_THINK | true | Enable extended reasoning |
Security / Sandboxing
| Variable | Default | Description |
FS_ALLOWED_PATHS | * | Comma-separated paths filesystem tool can access. * = no limit |
TERMINAL_ALLOWED_COMMANDS | * | Comma-separated allowed executables. * = allow all |
SSH_HOSTS_FILE | ssh_hosts.json | Named SSH connections config |
Persona
| Variable | Description |
NAVI_PERSONA | Inline global personality prompt |
NAVI_PERSONA_FILE | Path to .txt file with persona (recommended — inline doesn't parse multiline well) |
Other
| Variable | Default | Description |
DB_PATH | navi.db | SQLite file path |
LOG_LEVEL | INFO | DEBUG / INFO / WARNING / ERROR |
TOOLS_DIR | tools | User tools directory |
SESSION_FILES_DIR | session_files | Uploaded files directory |
SESSION_FILES_MAX_SIZE_MB | 200 | Max upload size per file |
SESSION_FILES_TTL_HOURS | 24 | File retention hours |