diff --git a/docs/agent.md b/docs/agent.md new file mode 100644 index 0000000..a1db0b8 --- /dev/null +++ b/docs/agent.md @@ -0,0 +1,110 @@ +# Agent Loop + +The agent loop is the core execution engine. File: `navi/core/agent.py`. + +## Three entry points + +### `run(session_id, user_message)` → `str` +Non-streaming. Runs the full tool-calling loop and returns the final text. Used for REST endpoints or background tasks where streaming is not needed. No planning phase. + +### `run_stream(session_id, user_message)` → `AsyncGenerator[AgentEvent]` +Streaming. Yields `AgentEvent` objects in real time. Used by the WebSocket handler. Includes planning phase. + +### `run_ephemeral(user_message, profile_id)` → `str` +Non-persistent subagent. No DB reads/writes. Uses a temporary in-memory context. Called by `SpawnAgentTool`. Assigns a unique session ID (`subagent_`) to isolate its scratchpad from the parent and from other subagents. + +--- + +## Planning phase (`_run_planning`) + +Runs only when `profile.planning_enabled = True`, before the tool-calling loop. + +**What it does:** +1. Sends the user request to the LLM with a special system prompt: "decide if this needs a plan". +2. LLM either responds `DIRECT` (skip planning) or produces a numbered step list. +3. If a real plan is returned, it's injected into `session.context` as an assistant message — the model then sees it as its own prior statement and naturally continues from it. +4. Yields `PlanReady(plan)` event → rendered as a collapsible card in the UI. + +**Detection logic:** +- Response starts with `DIRECT` → skip (no plan needed). +- No numbered steps found (regex `^\s*\d+[\.\)]`) → skip (malformed response). +- Otherwise → inject plan, emit `PlanReady`. + +**Parameters:** `think=False`, `temperature=0.3`, no tools → fast and structured. + +--- + +## Tool-calling loop + +Runs up to `profile.max_iterations` times. + +``` +iteration: + 1. Check stop_event → yield StreamStopped and return if set + 2. Call llm.stream_complete(context, tool_schemas) + - Yields ThinkingDelta events during reasoning + - Yields TextDelta events during text generation + - Final chunk carries tool_calls or finish_reason="stop" + 3a. finish_reason == "stop" (no tool calls): + → Save session, yield StreamEnd + → Run post-turn workers (e.g. context compression) + → Return + 3b. tool_calls present: + For each tool call: + - yield ToolStarted (pending card in UI) + - Create asyncio.Task for tool execution + - Set current_event_sink to a fresh Queue + - Drain the queue (receives subagent events in real time) + - yield ToolEvent (completed card in UI) + - Append tool result to session.context + Check if profile switched → reload profile + tools + Continue to next iteration +``` + +### Sub-agent event forwarding + +When a tool (e.g. `spawn_agent`) runs a subagent internally, subagent events arrive through `current_event_sink`. The parent agent drains that queue while the tool task runs, yielding subagent `ToolStarted`/`ToolEvent` events marked with `is_subagent=True`. + +### Cooperative stop + +Stop is signalled via `current_stop_event` (an `asyncio.Event`). The agent checks it: +- Before each LLM call +- During streaming (breaks out of the stream loop → calls `aclose()` on generator → Ollama closes gracefully, model stays in VRAM) +- After tool execution + +**Never use `task.cancel()`** for stopping — it corrupts Starlette's WebSocket state. + +--- + +## Workers (`_run_workers`) + +Workers run sequentially after `StreamEnd`. Each receives a `WorkerContext` with session state, token counts, and LLM access. + +Currently registered worker: `CompressionWorker` (`navi/workers/compressor.py`). + +Worker result: `WorkerResult.events` — list of `AgentEvent` objects that are yielded after `StreamEnd`. + +Pre-turn compression also exists: before calling the LLM, `run_stream()` checks if `session.context_token_count` is over the threshold and compresses proactively. + +See [`sessions.md`](sessions.md) for compression details. + +--- + +## System prompt construction + +Each LLM call uses `_build_context()`, which injects: +1. System message: `persona + "---" + profile.system_prompt` (built fresh every call, never stored in session.context). +2. Optional memory message: `"## What I remember about the user\n\n{summary}"`. +3. Conversation messages from `session.context` (system messages stripped to avoid duplication). + +This means profile switches and persona changes take effect immediately without modifying stored history. + +--- + +## Context vars set by Agent + +Before each `run_stream()` call: `current_session_id.set(session_id)`. +Before each tool task: `current_event_sink.set(sink_queue)`. +`run_ephemeral()` sets `current_session_id` to a unique subagent ID. + +See [`architecture.md`](architecture.md) for the full ContextVar table. diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..f129b99 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,96 @@ +# Architecture + +High-level component map and data flow for the Navi backend. + +## Component diagram + +``` +┌─────────────────────────────────────────────────────────┐ +│ Client (browser) │ +│ WebSocket /ws/sessions/{id} REST /sessions/* │ +└──────────────┬──────────────────────────┬───────────────┘ + │ WS frames │ HTTP + ▼ ▼ +┌─────────────────────────────────────────────────────────┐ +│ FastAPI (navi/main.py) │ +│ ┌──────────────────┐ ┌───────────────────────────┐ │ +│ │ websocket.py │ │ routes/sessions.py │ │ +│ │ websocket.py │ │ routes/messages.py │ │ +│ │ _AgentRun │ │ routes/agents.py │ │ +│ │ stop endpoint │ │ routes/health.py │ │ +│ └────────┬─────────┘ └────────────┬──────────────┘ │ +└───────────┼─────────────────────────┼───────────────────┘ + │ │ + ▼ ▼ +┌──────────────────────────────────────────────────────────┐ +│ Agent (navi/core/agent.py) │ +│ run_stream() → AsyncGenerator[AgentEvent] │ +│ run() → str (non-streaming) │ +│ run_ephemeral() → str (subagent, no DB) │ +│ │ +│ ┌──────────────┐ ┌───────────────┐ ┌──────────────┐ │ +│ │ Planning │ │ Tool-calling │ │ Workers │ │ +│ │ _run_planning│ │ loop │ │ (compression)│ │ +│ └──────────────┘ └───────────────┘ └──────────────┘ │ +└───┬──────────────┬────────────────┬─────────────────────┘ + │ │ │ + ▼ ▼ ▼ +┌────────┐ ┌──────────────┐ ┌────────────────────────┐ +│ LLM │ │ ToolRegistry│ │ SessionStore │ +│Backend │ │ (built-ins │ │ (SQLite / in-memory) │ +│(Ollama)│ │ + user tools│ │ │ +└────────┘ └──────────────┘ └────────────────────────┘ + │ + ┌─────────┴──────────┐ + │ MemoryStore │ + │ (SQLite facts) │ + └────────────────────┘ +``` + +## Request lifecycle (streaming) + +1. Client sends `{type: "message", content: "..."}` over WebSocket. +2. `websocket_session()` creates `_AgentRun`, subscribes a queue, launches `_run_agent()` as a task. +3. `_run_agent()` calls `agent.run_stream(session_id, content)`. +4. `run_stream()`: + a. Loads session + profile from store. + b. Pre-turn: checks if context needs compression; compresses if threshold exceeded. + c. **Planning phase** (if `profile.planning_enabled`): calls LLM once (non-streaming, no tools) to produce a step plan; injects plan as assistant message. + d. **Tool-calling loop** (up to `max_iterations`): + - Calls `llm.stream_complete()` → yields `ThinkingDelta`, `TextDelta`, tool call requests. + - If tool calls: executes each tool, yields `ToolStarted` → sub-agent events → `ToolEvent`. + - If `finish_reason == stop`: yields `StreamEnd`, runs post-turn workers. + e. Saves session to DB. +5. Events are broadcast from `_AgentRun` to all subscriber queues. +6. `_stream_to_client()` drains the queue → sends JSON to WebSocket. + +## Context vars (thread-safe, async-safe) + +Defined in `navi/tools/base.py`. Set by `Agent` before each tool call; tools read them. + +| ContextVar | Type | Purpose | +|---|---|---| +| `current_session_id` | `str \| None` | Session ID for tools needing per-session state (SSH pool, scratchpad) | +| `current_event_sink` | `Queue \| None` | Queue where subagent events are written; parent drains it in real time | +| `current_stop_event` | `Event \| None` | Set by `POST /sessions/{id}/stop`; agent checks before each LLM call | + +## Registry wiring (`navi/core/registry.py`) + +`build_default_registries()` is the composition root. It: +1. Creates `ToolRegistry`, registers all built-in tools. +2. Loads user tools from `tools/` directory. +3. Creates `ProfileRegistry`, registers all profiles from `navi/profiles/`. +4. Creates `BackendRegistry`, registers Ollama backend. +5. Creates `SpawnAgentTool` and `SwitchProfileTool` (need references to other registries). +6. Patches `spawn_tool._backend_registry` after backends are built (avoids circular dep). + +Called once at startup from `navi/api/deps.py`. + +## Two-buffer session design + +Each `Session` has two message lists: + +- `messages` — full display history, **never modified by compression**. Used for UI history. +- `context` — what the LLM sees. May be replaced with a summary by the compressor. + +See [`sessions.md`](sessions.md) for details. diff --git a/docs/config.md b/docs/config.md new file mode 100644 index 0000000..86bdc31 --- /dev/null +++ b/docs/config.md @@ -0,0 +1,92 @@ +# Configuration + +All configuration is loaded from `.env` via pydantic-settings (`navi/config.py`). The global `settings` object is imported everywhere as `from navi.config import settings`. + +## LLM + +| Variable | Type | Default | Description | +|---|---|---|---| +| `OLLAMA_HOST` | str | `http://localhost:11434` | Ollama server URL | +| `OLLAMA_DEFAULT_MODEL` | str | `gemma4:e2b-it-q8_0` | Default model (can be overridden per profile) | +| `OLLAMA_NUM_CTX` | int | `65536` | Context window size in tokens | +| `OLLAMA_THINK` | bool | `true` | Enable extended reasoning (thinking) | +| `OPENAI_API_KEY` | str | `""` | OpenAI API key (if using OpenAI backend) | +| `ANTHROPIC_API_KEY` | str | `""` | Anthropic API key (if using Anthropic backend) | + +## Security / Sandboxing + +| Variable | Type | Default | Description | +|---|---|---|---| +| `FS_ALLOWED_PATHS` | str | `"*"` | Comma-separated paths the `filesystem` tool can access. `"*"` = no restriction | +| `TERMINAL_ALLOWED_COMMANDS` | str | `"*"` | Comma-separated allowed executables for `terminal`. `"*"` = allow all | +| `SSH_HOSTS_FILE` | str | `ssh_hosts.json` | Path to JSON file with named SSH connections | + +`settings.fs_allowed_paths_list` and `settings.terminal_allowed_commands_list` are computed properties that parse the comma-separated strings into lists. + +## Database + +| Variable | Type | Default | Description | +|---|---|---|---| +| `DB_PATH` | str | `navi.db` | SQLite database file path | + +## Logging + +| Variable | Type | Default | Description | +|---|---|---|---| +| `LOG_LEVEL` | str | `INFO` | Python logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`) | + +## Tools + +| Variable | Type | Default | Description | +|---|---|---|---| +| `TOOLS_DIR` | str | `tools` | Directory for user-defined tools (auto-discovered at startup) | + +## Session files + +| Variable | Type | Default | Description | +|---|---|---|---| +| `SESSION_FILES_DIR` | str | `session_files` | Directory for uploaded session files | +| `SESSION_FILES_MAX_SIZE_MB` | int | `200` | Max upload size per file in megabytes | +| `SESSION_FILES_TTL_HOURS` | int | `24` | Hours before session file directories are cleaned up | + +## Context compression + +| Variable | Type | Default | Description | +|---|---|---|---| +| `CONTEXT_COMPRESSION_ENABLED` | bool | `true` | Enable/disable automatic context compression | +| `CONTEXT_COMPRESSION_THRESHOLD` | float | `0.80` | Trigger compression at this fraction of `OLLAMA_NUM_CTX` | +| `CONTEXT_KEEP_RECENT` | int | `10` | Number of recent conversation turns to keep verbatim | +| `CONTEXT_SUMMARY_TEMPERATURE` | float | `0.3` | Temperature for the summarization LLM call | + +## Persona + +| Variable | Type | Default | Description | +|---|---|---|---| +| `NAVI_PERSONA` | str | `""` | Global personality prompt prepended to every profile's system prompt | +| `NAVI_PERSONA_FILE` | str | `""` | Path to a `.txt` file containing the persona (preferred over inline `NAVI_PERSONA`) | + +**Recommended:** use `NAVI_PERSONA_FILE=persona.txt` rather than inlining the persona in `.env`, because multi-line values don't parse reliably in `.env` files. + +The `_load_persona_from_file` validator reads the file on startup if `NAVI_PERSONA` is empty and `NAVI_PERSONA_FILE` is set. + +## Example `.env` + +```dotenv +OLLAMA_HOST=http://localhost:11434 +OLLAMA_DEFAULT_MODEL=gemma4:e2b-it-q8_0 +OLLAMA_NUM_CTX=65536 +OLLAMA_THINK=true + +FS_ALLOWED_PATHS=* +TERMINAL_ALLOWED_COMMANDS=* + +DB_PATH=navi.db +LOG_LEVEL=INFO +TOOLS_DIR=tools + +CONTEXT_COMPRESSION_ENABLED=true +CONTEXT_COMPRESSION_THRESHOLD=0.80 +CONTEXT_KEEP_RECENT=10 + +NAVI_PERSONA_FILE=persona.txt +``` diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..94ad8bd --- /dev/null +++ b/docs/index.md @@ -0,0 +1,54 @@ +# Navi — Backend Documentation + +Personal modular AI agent system. FastAPI backend + WebSocket streaming + Ollama LLM. + +## Quick start + +```bash +# Install dependencies +python -m venv .venv && .venv/bin/pip install -r requirements.txt + +# Configure (copy and edit) +cp .env.example .env # set OLLAMA_HOST, NAVI_PERSONA_FILE, etc. + +# Run +.venv/bin/uvicorn navi.main:app --reload --reload-dir navi --port 8000 +``` + +Default UI: `http://localhost:8000` +Debug panel: `http://localhost:8000/debug` + +## File map + +| File | What it covers | +|---|---| +| [`architecture.md`](architecture.md) | Component diagram, data flow, dependency graph | +| [`agent.md`](agent.md) | Agent loop, planning phase, tool execution, subagents, workers | +| [`tools.md`](tools.md) | Built-in tools, user tool format, hot-reload, self-extension | +| [`sessions.md`](sessions.md) | Session model, dual-buffer design, context compression | +| [`websocket.md`](websocket.md) | WebSocket protocol — all events, stop mechanism | +| [`profiles.md`](profiles.md) | Profiles, system prompts, persona, profile switching | +| [`memory.md`](memory.md) | Long-term memory — facts, extraction, search | +| [`config.md`](config.md) | All environment variables with types and defaults | +| [`api.md`](api.md) | REST API endpoints + full WebSocket event schemas and sequences | + +## Key entry points in code + +| File | Role | +|---|---| +| `navi/main.py` | FastAPI app, router registration, startup hooks | +| `navi/core/agent.py` | Agent class — `run()`, `run_stream()`, `run_ephemeral()` | +| `navi/core/registry.py` | `build_default_registries()` — wires everything together | +| `navi/api/websocket.py` | WebSocket handler + `POST /sessions/{id}/stop` | +| `navi/config.py` | `Settings` — all config loaded from `.env` | +| `navi/profiles/` | Profile definitions (`secretary`, `server_admin`, `smart_home`) | +| `tools/` | User-defined tools (auto-discovered at startup) | + +## Stack + +- **Web framework**: FastAPI + uvicorn +- **LLM**: Ollama (primary), OpenAI-compatible backend wired in +- **Default model**: `gemma4:e2b-it-q8_0` (configurable per profile) +- **Database**: SQLite via aiosqlite +- **Logging**: structlog +- **Config**: pydantic-settings (reads `.env`) diff --git a/docs/memory.md b/docs/memory.md new file mode 100644 index 0000000..c58461a --- /dev/null +++ b/docs/memory.md @@ -0,0 +1,83 @@ +# Memory System + +Long-term user memory: facts extracted from conversations, stored in SQLite, injected into every session. + +## Storage (`navi/memory/store.py`) + +Three tables in `navi.db`: + +| Table | Purpose | +|---|---| +| `memory_facts` | Individual facts: `(category, key, value)` — unique on `(category, key)` | +| `memory_summary` | Single-row narrative summary generated from all facts | +| `session_memory_state` | Tracks which sessions have been processed (by `extracted_at`) | + +`MemoryStore` is initialized synchronously (creates tables), all operations are async via aiosqlite. + +### Key operations + +| Method | Description | +|---|---| +| `upsert_fact(category, key, value)` | Insert or update a fact | +| `search_facts(query, limit=15)` | Full-text search across category/key/value (OR across terms) | +| `delete_fact(key, category=None)` | Delete by key, optionally filtered by category | +| `get_all_facts(limit=None)` | All facts ordered by `(category, updated_at DESC)` | +| `get_summary()` | Current narrative summary text | +| `set_summary(content)` | Replace the summary | +| `mark_session_extracted(session_id)` | Record extraction timestamp | +| `get_extracted_at(session_id)` | Check if/when a session was processed | + +--- + +## Automatic extraction (`navi/memory/extractor.py`) + +Facts are extracted from stale sessions automatically. + +**Trigger:** `POST /sessions` (create new session) fires `_process_stale_sessions()` as a background task. + +**Stale criterion:** `session.last_active < now - 30 minutes` AND not yet extracted (or extracted before last activity). + +**Extraction process:** +1. Render conversation as plain text. +2. Call LLM with an extraction prompt: "extract facts the user shared about themselves, their preferences, projects, and environment." +3. Parse the response as `category: key = value` lines. +4. Upsert each fact into `memory_facts`. +5. Regenerate `memory_summary` from all current facts. +6. Mark session as extracted. + +--- + +## Memory injection into agent context + +At the start of each `run_stream()` / `run()` / `run_ephemeral()` call, `_memory_msg()` is called: + +```python +async def _memory_msg(self) -> Message | None: + summary = await self._memory.get_summary() + if not summary: + return None + return Message(role="system", content=f"## What I remember about the user\n\n{summary}") +``` + +This message is inserted after the main system message but before conversation history. The agent reads it on every turn. + +--- + +## Memory tools + +**`memory_search`** — searches facts by keyword query. Returns matching facts with category/key/value. Agent should call this when the user mentions something personal that may already be known. + +**`memory_forget`** — deletes facts matching a key (optionally filtered by category). Agent calls this when the user explicitly asks to forget something or when a fact is clearly outdated. + +--- + +## Memory usage guidelines (from persona) + +Call `memory_search` when: +- The user mentions something personal (location, project, preference, recurring task). +- About to make an assumption about the user's environment or preferences — verify first. +- The user asks about something helped with before. + +Do NOT call `memory_search` reflexively at the start of every session — only when context warrants it. + +Call `memory_forget` only when the user explicitly asks, or when a stored fact is clearly wrong or outdated. diff --git a/docs/profiles.md b/docs/profiles.md new file mode 100644 index 0000000..027eb74 --- /dev/null +++ b/docs/profiles.md @@ -0,0 +1,117 @@ +# Profiles + +Profiles define the agent's identity, tools, and behaviour for a specific domain. + +## Profile definition (`navi/profiles/base.py`) + +```python +@dataclass +class AgentProfile: + id: str # unique identifier (used in API, sessions, switch_profile) + name: str # human-readable name + description: str # shown in profile selector + system_prompt: str # domain-specific instructions + enabled_tools: list[str] # tool names available to this profile + model: str = "..." # Ollama model to use + temperature: float = 0.7 + max_iterations: int = 50 + planning_enabled: bool = False # whether to run the planning phase before the loop + llm_backend: str = "ollama" # backend key in BackendRegistry +``` + +## Built-in profiles + +| Profile ID | Name | Model | Temperature | Planning | +|---|---|---|---|---| +| `secretary` | Personal Secretary | gemma4:26b-a4b-it-q4_K_M | 0.7 | Yes | +| `server_admin` | Server Administrator | gemma4:26b-a4b-it-q4_K_M | 0.2 | Yes | +| `smart_home` | Smart Home Assistant | gemma4:26b-a4b-it-q4_K_M | 0.3 | Yes | + +All profiles have the same base tool set: +``` +todo, scratchpad, switch_profile, +web_search, web_view, http_request, +filesystem, code_exec, terminal, ssh_exec, image_view, +memory_search, memory_forget, +reload_tools, write_tool, list_tools, tool_manual, +spawn_agent +``` + +User tools from `tools/enabled.json` are merged in on top of the profile's `enabled_tools` list. + +## System prompt construction + +The final system prompt the LLM sees is: + +``` +{NAVI_PERSONA} + +--- + +{profile.system_prompt} +``` + +`NAVI_PERSONA` is the global personality layer — loaded from `settings.navi_persona` or `settings.navi_persona_file`. It contains: personality, self-extension rules, planning/scratchpad instructions, delegation guidance, memory rules. + +`profile.system_prompt` is the domain-specific layer: tool priorities, workflow rules, scratchpad section names, safety rules for that domain. + +The system message is **never stored** in `session.context`. It is injected fresh on every LLM call. Profile switches take effect immediately. + +## Adding a new profile + +1. Create `navi/profiles/my_profile.py`: +```python +from .base import AgentProfile + +my_profile = AgentProfile( + id="my_profile", + name="My Profile", + description="What this profile is for.", + system_prompt="""Mode: ... + +## Tool priorities +1. web_search — ... +2. filesystem — ... + +## Scratchpad sections +- findings, errors, plan + +## Safety rules +...""", + enabled_tools=[ + "todo", "scratchpad", "switch_profile", + "web_search", "filesystem", "code_exec", + "memory_search", "memory_forget", + "reload_tools", "write_tool", "list_tools", "tool_manual", + "spawn_agent", + ], + model="gemma4:26b-a4b-it-q4_K_M", + temperature=0.5, + planning_enabled=True, +) +``` + +2. Register it in `navi/profiles/__init__.py`: +```python +from .my_profile import my_profile +ALL_PROFILES = [..., my_profile] +``` + +## Profile switching + +`switch_profile` tool updates `session.profile_id` in the DB. In `run_stream()`, after each tool execution batch, the agent checks if `session.profile_id` changed and reloads profile + tools. The switch takes full effect on the next LLM call within the same run. + +Rules from the persona: +- Don't switch for a single off-topic question. +- Switch when the session domain clearly changes (e.g. coding → server admin). +- Never switch back and forth repeatedly in one conversation. + +## Per-profile scratchpad sections + +Each profile defines named sections appropriate for its domain: + +| Profile | Scratchpad sections | +|---|---| +| secretary | `findings`, `sources`, `drafts` | +| server_admin | `status`, `logs`, `errors`, `plan` | +| smart_home | `state`, `config`, `errors` | diff --git a/docs/sessions.md b/docs/sessions.md new file mode 100644 index 0000000..1662528 --- /dev/null +++ b/docs/sessions.md @@ -0,0 +1,108 @@ +# Sessions + +Session management, dual-buffer design, and context compression. + +## Session model (`navi/core/session.py`) + +```python +class Session(BaseModel): + id: str # UUID + profile_id: str # active profile + messages: list[Message] # full display history — never compressed + context: list[Message] # LLM context — may be replaced with summary + context_token_count: int # accumulated tokens; reset to 0 after compression + pinned: bool # pinned sessions appear first in sidebar + created_at: datetime + last_active: datetime +``` + +## Dual-buffer design + +Two separate message lists serve different purposes: + +| Buffer | Purpose | Modified by compression? | +|---|---|---| +| `session.messages` | Full display history shown in the UI | Never | +| `session.context` | What the LLM sees on each call | Yes — old turns replaced with a summary | + +Tool results, image injections, and assistant messages are appended to **both** buffers. When compression runs, only `session.context` is modified. + +**Note:** System messages are **not stored** in either buffer. They are injected fresh from the current profile on every LLM call via `_build_context()`. This makes profile switches take effect immediately. + +## Session store + +### `InMemorySessionStore` +Simple dict-backed store for testing. + +### `SQLiteSessionStore` (`navi/core/sqlite_session_store.py`) +Production store backed by SQLite via aiosqlite. + +- `create(profile_id)` → new `Session` +- `get(session_id)` → `Session | None` +- `save(session)` — serializes with `model_dump(mode='json')` (required for datetime serialization) +- `list_all()` → sorted by `(pinned DESC, last_active DESC)` +- `delete(session_id)` → `bool` +- `set_pinned(session_id, pinned)` → `bool` + +DB path: `settings.db_path` (default: `navi.db`). + +--- + +## Context compression (`navi/core/compressor.py`) + +Keeps the LLM context within the token budget by summarizing old conversation turns. + +### When it triggers + +Two trigger points: + +1. **Pre-turn** (in `run_stream()`): before calling the LLM, checks `session.context_token_count` against the threshold. Compresses if `tokens >= num_ctx * threshold`. +2. **Post-turn** (via `CompressionWorker`): after `StreamEnd`, the worker re-checks and compresses if needed. + +Config values (`settings`): +- `context_compression_enabled: bool = True` +- `context_compression_threshold: float = 0.80` — trigger at 80% of `ollama_num_ctx` +- `context_keep_recent: int = 10` — keep last N conversational turns verbatim +- `context_summary_temperature: float = 0.3` + +### Compression algorithm + +`compress_context(context, llm, model, temperature, keep_recent)`: + +1. Partition messages into `to_summarize` (old turns) and `to_keep` (recent `keep_recent` turns). + - A "turn" = one user message + all following assistant/tool messages up to the next user message. + - Tool call groups (assistant + results) are never split across the partition. + - Existing summary messages are folded into the next pass. +2. Format `to_summarize` as plain text (tool calls shown as compact previews, max 120 chars for args, max 300 chars for results). +3. Truncate formatted input to `_MAX_SUMMARY_INPUT_CHARS = 12_000` chars. +4. Call `llm.complete()` with `think=False` to produce a bullet-point summary. +5. Replace `to_summarize` with a single summary message (`role=user`, `is_summary=True`). +6. Return `system_msgs + [summary_msg] + to_keep`. + +If compression fails, the exception propagates to `CompressionWorker`, which logs a warning and continues — compression failure is non-fatal. + +### What is never compressed + +- `session.messages` — full history is always intact. +- The last `context_keep_recent` conversational turns. +- System messages (never stored in context anyway). + +--- + +## Session file uploads + +Files uploaded via `POST /sessions/{id}/files` are stored in `session_files/{session_id}/`. + +- Max size: `session_files_max_size_mb` (default: 200 MB) +- TTL: `session_files_ttl_hours` (default: 24 hours) +- A background `cleanup_loop` (started on FastAPI startup) deletes stale session directories. +- Executable files (`.sh`, `.py`, `.exe`, etc.) are rejected. +- Duplicate filenames get a numeric suffix. + +When files are uploaded via the UI, their paths are appended to the user message content: +``` +[Uploaded files on disk: +- filename.pdf → session_files/{id}/filename.pdf] +``` + +This lets the agent use `filesystem` or `code_exec` to access the files. diff --git a/docs/tools.md b/docs/tools.md new file mode 100644 index 0000000..8cd05f4 --- /dev/null +++ b/docs/tools.md @@ -0,0 +1,139 @@ +# Tool System + +Tools are the agent's actions. All tools implement the `Tool` ABC from `navi/tools/base.py`. + +## Two tiers + +### Built-in tools (`navi/tools/`) + +Registered in `build_default_registries()` as builtins. Never removed on hot-reload. + +| Tool | Name | Description | +|---|---|---| +| `WebSearchTool` | `web_search` | DuckDuckGo search | +| `WebViewTool` | `web_view` | Fetch and render a URL | +| `FilesystemTool` | `filesystem` | Read/write/list local files (path restrictions via config) | +| `HttpRequestTool` | `http_request` | Generic HTTP client (GET/POST/etc.) | +| `CodeExecTool` | `code_exec` | Execute Python in a subprocess sandbox | +| `TerminalTool` | `terminal` | Run shell commands (command allowlist via config) | +| `SshExecTool` | `ssh_exec` | SSH into remote hosts; connection pool keyed by session ID | +| `ImageViewTool` | `image_view` | Load image from path/URL → returns base64 for multimodal LLM | +| `TodoTool` | `todo` | Per-session task checklist (set/update/read) | +| `ScratchpadTool` | `scratchpad` | Per-session named working notes (write/append/read/clear) | +| `ReloadToolsTool` | `reload_tools` | Hot-reload user tools without server restart | +| `WriteToolTool` | `write_tool` | Write a new user tool file and reload immediately | +| `ListToolsTool` | `list_tools` | Return the live tool list from registry | +| `ToolManualTool` | `tool_manual` | Return manuals/{name}.md or auto-generate from schema | +| `MemorySearchTool` | `memory_search` | Search long-term memory facts | +| `MemoryForgetTool` | `memory_forget` | Delete a fact from long-term memory | +| `SpawnAgentTool` | `spawn_agent` | Spawn an isolated subagent (blocking, synchronous from caller's view) | +| `SwitchProfileTool` | `switch_profile` | Switch the active profile for a session | + +### User tools (`tools/*.py`) + +Written by the agent via `write_tool` or manually. Auto-discovered at startup. + +- Files starting with `_` are ignored. +- `tools/enabled.json` — list of user tool names to include in all profiles automatically. +- `tools/_template.py` — canonical format reference (not loaded). + +Currently present: `get_current_datetime.py`, `user_notes.py`. + +--- + +## Tool formats + +### Module-level format (preferred for user tools) + +```python +name = "my_tool" +description = "What it does and when to use it — be specific." +parameters = { + "type": "object", + "properties": { + "param": {"type": "string", "description": "..."} + }, + "required": ["param"] +} + +async def execute(params: dict) -> str: + # Return a plain string on success. + # Raise an exception to signal failure. + return "result" +``` + +No classes, no module-level `print()`. The loader wraps `execute` in a `Tool` subclass automatically. + +### Class-based format (built-in tools) + +```python +from navi.tools.base import Tool, ToolResult + +class MyTool(Tool): + name = "my_tool" + description = "..." + parameters = {"type": "object", "properties": {...}, "required": [...]} + + async def execute(self, params: dict) -> ToolResult: + return ToolResult(success=True, output="result") +``` + +`ToolResult` fields: +- `success: bool` +- `output: str` — always a string; LLM sees this +- `error: str | None` — included in output on failure via `to_message_content()` +- `metadata: dict` — internal hints (e.g. `is_image: True` → triggers image injection into context) + +--- + +## Tool loading (`navi/tools/loader.py`) + +`load_tools_from_dir(tools_dir)` returns `LoadResult(loaded, errors)`. + +Load order: +1. Try module-level format (checks for `name`, `description`, `parameters`, `execute`). +2. Fall back to class-based (scans for `Tool` subclasses). + +Errors are **isolated per file** — one broken file does not prevent others from loading. Errors are logged and returned in `LoadResult.errors`. + +--- + +## Hot-reload + +`reload_tools` tool calls `ToolRegistry.reload_user_tools(tools_dir)`: +1. Drops all tools that are NOT in `_builtin_names`. +2. Re-runs `load_tools_from_dir`. +3. New tools registered without server restart. + +New tools become available from the **next** user message (tool schemas are built at `run_stream()` entry, not during execution). + +--- + +## Self-extension via `write_tool` + +`WriteToolTool` validates the code before writing (checks for the 4 required definitions). On success: +1. Writes the file to `tools/{name}.py`. +2. Adds the name to `tools/enabled.json`. +3. Calls `reload_user_tools()` — tool is registered immediately. + +The agent should call `tool_manual("write_tool")` before using `write_tool` for the first time — the manual at `manuals/write_tool.md` has the full format reference and a complete example. + +--- + +## Scratchpad and Todo + +Both are per-session, stored in-memory keyed by `current_session_id`. + +**Scratchpad** — named sections for working notes within a task. Operations: `write`, `append`, `read`, `clear`. Subagents get isolated scratchpads (unique UUID-based session ID in `run_ephemeral()`). + +**Todo** — checklist for tracking multi-step plans. Operations: `set` (replace all tasks), `update` (set status of one task), `read`. Statuses: `pending`, `in_progress`, `done`, `failed`, `skipped`. + +--- + +## Image tool flow + +When `image_view` succeeds, it returns `ToolResult` with `metadata={"is_image": True, "base64": "..."}`. + +The agent detects this and appends a synthetic user message with the image to `session.context` (but not `session.messages`). This makes the image visible to the next LLM call without polluting the display history. + +See [`sessions.md`](sessions.md) for the dual-buffer design. diff --git a/docs/visual.html b/docs/visual.html new file mode 100644 index 0000000..ce00e17 --- /dev/null +++ b/docs/visual.html @@ -0,0 +1,1362 @@ + + + + + +Navi — Architecture & Reference + + + + + + + + +
+ + +
+

🧭 Project Overview

+

Navi is a personal modular AI agent system. FastAPI backend + vanilla JS client. The agent is named Navi — female personal assistant. Runs locally via Ollama.

+ +
+
+
Entry point
+
navi/main.py
+
FastAPI app
+
+
+
Run command
+
uvicorn navi.main:app
+
--reload --port 8000
+
+
+
Default model
+
gemma4:e2b-it-q8_0
+
Ollama, 2B active params
+
+
+
Context window
+
65 536 tokens
+
OLLAMA_NUM_CTX
+
+
+
Database
+
SQLite
+
navi.db via aiosqlite
+
+
+
Thinking
+
Enabled
+
OLLAMA_THINK=true
+
+
+
+ + +
+

📦 Stack

+
+ + + + + + + + + + +
LayerTechnologyNotes
Web frameworkFastAPI + uvicornASGI, async throughout
LLM backend (primary)OllamaLocal, OllamaBackend in navi/llm/ollama.py
LLM backend (alt)OpenAI-compatiblenavi/llm/openai_backend.py
DatabaseaiosqliteSessions + memory facts in navi.db
Configpydantic-settingsReads .env, typed Settings object
LoggingstructlogStructured JSON-friendly logs
ClientVanilla JS ES modulesmarked.js + highlight.js via esm.sh CDN
Markdown renderingmarked.jsIn browser, assistant messages
+
+
+ + +
+

🗂️ Component Map

+ +
+
+
+
Client (browser)
+
+ WebSocket /ws/sessions/{id} + REST /sessions/* + REST /agents/* +
+
+
+
+
+
+
FastAPI — navi/main.py
+
+ api/websocket.py · _AgentRun · stop endpoint + routes/sessions.py + routes/agents.py + routes/messages.py +
+
+
+
+
+
+
Agent — navi/core/agent.py
+
+ run_stream() → AsyncGenerator[AgentEvent] + run() → str + run_ephemeral() → str (subagent) + _run_planning() + _run_workers() +
+
+
+
+
+
+
Registries — navi/core/registry.py · build_default_registries()
+
+ ToolRegistry + ProfileRegistry + BackendRegistry +
+
+
+
+
+
+
+
LLM Backend
+
+ OllamaBackend + complete() + stream_complete() +
+
+
+
SessionStore (SQLite)
+
+ messages[] + context[] +
+
+
+
MemoryStore (SQLite)
+
+ memory_facts + summary +
+
+
+
+
+
+ + +
+

🔄 Request Lifecycle

+

Streaming flow from WebSocket message to final response.

+
+
+
1
+
+ Client sends message + {type:"message", content:"...", images:[...]} over WebSocket +
+
+
+
2
+
+ websocket_session() creates _AgentRun + Subscribes a queue, launches _run_agent() as asyncio task, sends stream_start +
+
+
+
3
+
+ Pre-turn compression check + If context_token_count ≥ num_ctx × threshold → compress context before LLM call +
+
+
+
4
+
+ Planning phase + If profile.planning_enabled: fast non-streaming LLM call → yields plan_ready event if plan generated +
+
+
+
5
+
+ Tool-calling loop (max_iterations) + Calls llm.stream_complete() → yields thinking/text/tool events. Loops until finish_reason=stop +
+
+
+
6
+
+ StreamEnd + workers + Saves session to DB. Runs post-turn workers (compression). Yields context_compressed if triggered +
+
+
+
+
+ Done + Events broadcast from _AgentRun to all subscriber queues → sent as JSON to WebSocket +
+
+
+
+ + +
+

🔗 Context Vars

+

Thread-safe async-safe state shared between Agent and tools. Defined in navi/tools/base.py.

+
+ + + + + + + + + + + + + + + + + + + + +
ContextVarTypeSet byUsed by
current_session_idstr | NoneAgent before each runSSH pool, scratchpad, todo — per-session state
current_event_sinkQueue | Nonerun_stream() per tool taskrun_ephemeral() forwards sub-agent events to parent stream
current_stop_eventEvent | None_run_agent() before run_stream()Agent loop checks before each LLM call and mid-stream
+
+
+ Never use task.cancel() for stopping generation. It corrupts Starlette's WebSocket receive state. Use current_stop_event.set() via POST /sessions/{id}/stop. +
+
+ + +
+

⚙️ Agent Loop

+

Three entry points in navi/core/agent.py:

+
+ + + + + + + + + + + + + + + + + + + + +
MethodReturnsPersistencePlanning
run(session_id, msg)strSQLite sessionNo
run_stream(session_id, msg)AsyncGenerator[AgentEvent]SQLite sessionYes (if profile.planning_enabled)
run_ephemeral(msg, profile_id)strIn-memory onlyNo
+
+ +

System prompt construction

+

Built fresh on every LLM call — never stored in session.context.

+
NAVI_PERSONA (global personality)
+───────────────────────────────────────
+profile.system_prompt (domain rules)
+───────────────────────────────────────
+[memory injection: "## What I remember about the user"]
+───────────────────────────────────────
+session.context messages (history, no system msgs)
+ +

Sub-agent isolation

+

run_ephemeral() sets current_session_id = "subagent_<uuid12>" so each subagent has its own isolated scratchpad and SSH connection pool entry.

+
+ + +
+

🗺️ Planning Phase

+

Runs before the tool-calling loop when profile.planning_enabled = true.

+ +
+
+
1
+
+ LLM call: decide or plan + Fast non-streaming call: think=False, temperature=0.3, no tools +
+
+
+
2
+
+ Response classification + Starts with DIRECT → skip planning. No numbered steps found → skip. Otherwise → real plan. +
+
+
+
3
+
+ Plan injection + Appended to session.context as assistant message — model continues from it naturally +
+
+
+
4
+
+ PlanReady event emitted + Rendered as collapsible 🗺️ card in UI before execution begins +
+
+
+
+ + +
+

💾 Sessions

+ +

Session model (navi/core/session.py)

+
+ + + + + + + + +
FieldTypeDescription
idUUID strUnique session identifier
profile_idstrActive profile
messageslist[Message]Full history Never compressed. Used for UI display.
contextlist[Message]LLM context May be replaced by compression summary.
context_token_countintAccumulated tokens; reset to 0 after compression
pinnedboolPinned sessions appear first in sidebar
+
+ +

Dual-buffer design

+
+ Key invariant: session.messages is the full, unmodified conversation history — always available for display. session.context is what the LLM actually sees — may contain a compression summary instead of old messages. +
+ +

Message format

+
+ + + + + + + + + + +
FieldPresent onType
rolealluser | assistant | tool | system
contentmoststr | None
imagesuser, assistantlist[str] — base64
tool_callsassistant (when calling tools)list[ToolCallRequest]
tool_call_idtool resultsstr
nametool resultstool name
is_summarycompressed blocksbool
created_atuser/assistantISO 8601 datetime
+
+
+ + +
+

🗜️ Context Compression

+

Keeps the LLM context within the token budget. Only session.context is modified — session.messages is never touched.

+ +

Trigger points

+
+
+
Pre-turn
+
Before LLM call in run_stream()
+
Checks context_token_count against threshold
+
+
+
Post-turn (worker)
+
After StreamEnd via CompressionWorker
+
Re-checks and compresses if still needed
+
+
+ +

Algorithm

+
+
+
1
+
+ Partition into turns + Keep last context_keep_recent turns verbatim. Tool call groups never split. +
+
+
+
2
+
+ Format old turns as text + Tool args truncated to 120 chars, results to 300 chars. Total input capped at 12 000 chars. +
+
+
+
3
+
+ Summarize with LLM + think=False, bullet-point output. Same model — no model swap or extra loading. +
+
+
+
4
+
+ Replace with summary message + role=user, is_summary=True. Result: system_msgs + [summary] + recent_turns +
+
+
+ +

Config

+
+ + + + + + +
SettingDefaultDescription
CONTEXT_COMPRESSION_ENABLEDtrueEnable/disable
CONTEXT_COMPRESSION_THRESHOLD0.80Trigger at 80% of context window
CONTEXT_KEEP_RECENT10Turns kept verbatim
CONTEXT_SUMMARY_TEMPERATURE0.3Summarization temperature
+
+
+ + +
+

🔧 Built-in Tools

+

Registered in build_default_registries() as builtins. Never removed on hot-reload.

+
+ + + + + + + + + + + + + + + + + + + + +
NameClassDescription
web_searchWebSearchToolDuckDuckGo web search
web_viewWebViewToolFetch and render a URL as text
filesystemFilesystemToolRead/write/list local files (path allowlist via config)
http_requestHttpRequestToolGeneric HTTP client — GET/POST/PUT/etc.
code_execCodeExecToolExecute Python in a subprocess sandbox
terminalTerminalToolRun shell commands (command allowlist via config)
ssh_execSshExecToolSSH into remote hosts; connection pool keyed by session ID
image_viewImageViewToolLoad image from path/URL → base64 for multimodal LLM
todoTodoToolPer-session task checklist (set/update/read)
scratchpadScratchpadToolPer-session named working notes (write/append/read/clear)
reload_toolsReloadToolsToolHot-reload user tools without server restart
write_toolWriteToolToolWrite a new user tool file and reload immediately
list_toolsListToolsToolReturn the live tool list from registry
tool_manualToolManualToolReturn manuals/{name}.md or auto-generate from schema
memory_searchMemorySearchToolSearch long-term memory facts by keyword
memory_forgetMemoryForgetToolDelete a fact from long-term memory
spawn_agentSpawnAgentToolSpawn an isolated subagent (blocking, synchronous)
switch_profileSwitchProfileToolSwitch the active profile for the session
+
+
+ + +
+

🔌 User Tools

+ +
+
+

Discovery

+
    +
  • Loaded from tools/*.py at startup
  • +
  • Files starting with _ are ignored
  • +
  • tools/enabled.json — names to include in all profiles
  • +
  • Errors are isolated per file (one bad file ≠ failure)
  • +
  • Hot-reload via reload_tools or after write_tool
  • +
+
+
+

Current user tools

+
+
+
get_current_datetime
+
Returns current date/time
+
+
+
user_notes
+
Persistent personal notes store
+
+
+
+
+ +

Image tool → multimodal injection

+

When image_view succeeds, it returns metadata={is_image: true, base64: "..."}. The agent appends a synthetic user message with the image to session.context (not messages) — making it visible to the next LLM call without polluting display history.

+
+ + +
+

📝 Tool Format

+ +

Module-level format (preferred for user tools)

+
name = "my_tool"
+description = "What it does and when to use it — be specific."
+parameters = {
+    "type": "object",
+    "properties": {
+        "param": {"type": "string", "description": "..."}
+    },
+    "required": ["param"]
+}
+
+async def execute(params: dict) -> str:
+    # Return a plain string on success.
+    # Raise an exception to signal failure.
+    return "result"
+
No classes, no module-level print(). The loader wraps execute in a Tool subclass automatically.
+ +

ToolResult (class-based format)

+
+ + + + + + +
FieldTypeDescription
successboolWhether the tool succeeded
outputstrAlways a string — LLM sees this
errorstr | NoneIncluded in LLM output on failure
metadatadictInternal hints, e.g. is_image: True
+
+ +

Self-extension via write_tool

+

The agent can install new tools permanently at runtime. WriteToolTool validates, writes to tools/{name}.py, adds to tools/enabled.json, then hot-reloads. New tool is available from the next user message.

+
+ + +
+

📡 WebSocket Protocol

+ +

Endpoint: ws://host/ws/sessions/{session_id}
+ Closes with code 4004 if session not found.

+ +

Client → Server

+
{
+  "type": "message",         // required, always "message"
+  "content": "user text",    // required, non-empty
+  "images": ["base64..."],   // optional; data: URI prefix stripped server-side
+  "files": [                 // optional; from POST /sessions/{id}/files
+    {"name": "file.pdf", "path": "/abs/path/..."}
+  ]
+}
+
+ + +
+

📬 Events Reference

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TypeDirectionFieldsDescription
stream_startS→CAgent processing began. Block user input.
thinking_deltaS→CdeltaReasoning chunk (streaming). Accumulate until thinking_end.
thinking_endS→CReasoning phase complete. Auto-collapsed in UI.
turn_thinkingS→Cthinking, is_subagentFull reasoning block from tool-calling turn (non-streaming).
plan_readyS→CplanStep-by-step plan before execution. Rendered as 🗺️ card.
tool_startedS→Ctool, args, is_subagentTool call began. Shows pending spinner in UI immediately.
tool_callS→Ctool, args, result, success, is_subagentTool finished. Pairs with preceding tool_started.
stream_deltaS→CdeltaFinal response text chunk. Accumulate to build full content.
stream_endS→Ccontent, context_tokens, max_context_tokensFinal response complete. Unlock user input.
stream_stoppedS→CUser stopped generation via POST /sessions/{id}/stop.
context_compressedS→Cmessages_before, messages_afterContext compression ran after this turn.
profile_switchedS→Cprofile_id, profile_nameActive profile changed mid-stream by switch_profile tool.
errorS→CmessageUnhandled error. Some are recoverable, some terminate the stream.
+
+
+ + +
+

🎬 Typical Event Sequences

+ +

Simple question (no tools)

+
+
stream_start
+
thinking_delta × N // if model reasons
+
thinking_end
+
stream_delta × N
+
stream_end
+
+ +

With planning + tools

+
+
stream_start
+
plan_ready // if planning_enabled
+
turn_thinking // reasoning before tool selection
+
tool_started
+
tool_call
+
tool_started
+
tool_call
+
thinking_delta × N
+
thinking_end
+
stream_delta × N
+
stream_end
+
context_compressed // optional, if threshold hit
+
+ +

Subagent (spawn_agent)

+
+
stream_start
+
tool_started spawn_agent is_subagent=false
+
turn_thinking is_subagent=true
+
tool_started web_search is_subagent=true
+
tool_call web_search is_subagent=true
+
tool_started filesystem is_subagent=true
+
tool_call filesystem is_subagent=true
+
tool_call spawn_agent is_subagent=false
+
stream_delta × N
+
stream_end
+
+ +

Profile switch

+
+
stream_start
+
tool_started switch_profile
+
profile_switched // update UI here
+
tool_call switch_profile
+
stream_delta × N
+
stream_end
+
+
+ + +
+

🌐 REST API

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MethodPathDescription
GET/healthHealth check → {"status":"ok"}
GET/agents/profilesList all available profiles
GET/agents/toolsList all registered tools (builtin + user)
POST/sessionsCreate session → {session_id, profile_id, created_at}
GET/sessionsList all sessions (sorted by pinned+last_active)
GET/sessions/{id}Full session with message history (display buffer)
GET/sessions/{id}/contextLLM context (may differ from messages — for debugging)
PATCH/sessions/{id}/pinPin or unpin a session
DEL/sessions/{id}Delete session and its uploaded files
POST/sessions/{id}/filesUpload file (multipart/form-data). Max 200 MB. TTL 24h.
POST/sessions/{id}/messagesSend message, wait for full response (non-streaming)
POST/sessions/{id}/stopSignal cooperative stop for running agent
WS/ws/sessions/{id}Streaming agent interface
+
+
+ + +
+

👤 Profiles

+

Profiles define tools, system prompt, model, and behaviour per domain. Defined in navi/profiles/.

+ +
+ + + + + + + + + + + + + + + + + + + + +
Profile IDNameModelTempPlanning
secretaryPersonal Secretarygemma4:26b-a4b-it-q4_K_M0.7Yes
server_adminServer Administratorgemma4:26b-a4b-it-q4_K_M0.2Yes
smart_homeSmart Home Assistantgemma4:26b-a4b-it-q4_K_M0.3Yes
+
+ +

Per-profile scratchpad sections

+
+ + + + + +
ProfileSectionsDomain focus
secretaryfindings, sources, draftsResearch, writing, analysis
server_adminstatus, logs, errors, planRemote ops, monitoring
smart_homestate, config, errorsHome Assistant, IoT, automations
+
+ +

AgentProfile fields

+
+ + + + + + + + + + + +
FieldTypeDescription
idstrUnique identifier used in API and sessions
namestrHuman-readable name for UI
system_promptstrDomain-specific instructions (appended after persona)
enabled_toolslist[str]Tool names available to this profile
modelstrOllama model override (falls back to settings default)
temperaturefloatLLM temperature
max_iterationsintTool-calling loop limit (default 50)
planning_enabledboolRun planning phase before tool loop
llm_backendstrBackend key in BackendRegistry (default "ollama")
+
+
+ + +
+

🧠 Memory System

+

Long-term user memory: facts extracted from conversations, stored in SQLite, injected into every session.

+ +

Database schema

+
+ + + + + + + + + + + + + + + + + +
TableKey columnsPurpose
memory_facts(category, key) uniqueIndividual facts about the user — preferences, projects, environment
memory_summarySingle row (id=1)Narrative summary generated from all facts; injected into every session
session_memory_statesession_id, extracted_atTracks which sessions have been processed for extraction
+
+ +

Automatic extraction trigger

+

POST /sessions (create new session) fires _process_stale_sessions() as a background task. Processes sessions idle > 30 minutes that haven't been extracted yet.

+ +

Memory injection

+

On every run_stream() / run() call, _memory_msg() fetches the summary and returns a system message: "## What I remember about the user\n\n{summary}". Injected after main system prompt, before conversation history.

+ +

Memory tools usage rules

+
+ Call memory_search when the user mentions something personal or before making assumptions about their environment. Do not call at session start reflexively — only when context warrants it. Call memory_forget only when explicitly asked. +
+
+ + +
+

⚙️ Configuration

+

All settings read from .env via pydantic-settings. Imported as from navi.config import settings.

+ +

LLM

+
+ + + + + + +
VariableDefaultDescription
OLLAMA_HOSThttp://localhost:11434Ollama server URL
OLLAMA_DEFAULT_MODELgemma4:e2b-it-q8_0Default model (overridable per profile)
OLLAMA_NUM_CTX65536Context window size in tokens
OLLAMA_THINKtrueEnable extended reasoning
+
+ +

Security / Sandboxing

+
+ + + + + +
VariableDefaultDescription
FS_ALLOWED_PATHS*Comma-separated paths filesystem tool can access. * = no limit
TERMINAL_ALLOWED_COMMANDS*Comma-separated allowed executables. * = allow all
SSH_HOSTS_FILEssh_hosts.jsonNamed SSH connections config
+
+ +

Persona

+
+ + + + +
VariableDescription
NAVI_PERSONAInline global personality prompt
NAVI_PERSONA_FILEPath to .txt file with persona (recommended — inline doesn't parse multiline well)
+
+ +

Other

+
+ + + + + + + + +
VariableDefaultDescription
DB_PATHnavi.dbSQLite file path
LOG_LEVELINFODEBUG / INFO / WARNING / ERROR
TOOLS_DIRtoolsUser tools directory
SESSION_FILES_DIRsession_filesUploaded files directory
SESSION_FILES_MAX_SIZE_MB200Max upload size per file
SESSION_FILES_TTL_HOURS24File retention hours
+
+
+ +
+ + + + diff --git a/docs/websocket.md b/docs/websocket.md new file mode 100644 index 0000000..80a7a3b --- /dev/null +++ b/docs/websocket.md @@ -0,0 +1,116 @@ +# WebSocket Protocol + +Full protocol reference for the streaming agent interface. File: `navi/api/websocket.py`. + +## Connection + +`ws://host/ws/sessions/{session_id}` + +The session must exist before connecting (create via `POST /sessions`). If the session is not found, the WebSocket closes with code `4004`. + +--- + +## Messages: client → server + +```json +{ + "type": "message", + "content": "user text", + "images": ["base64string", ...], + "files": [{"name": "file.pdf", "path": "/abs/path"}] +} +``` + +- `type` must be `"message"`. Other types return an error frame. +- `content` is required and must be non-empty. +- `images`: optional list of base64-encoded images (data URIs accepted; the `data:...;base64,` prefix is stripped server-side). +- `files`: optional list of uploaded file references (appended to content as `[Uploaded files on disk: ...]`). + +--- + +## Messages: server → client + +All frames are JSON objects with a `type` field. + +### Stream lifecycle + +| Frame | When | +|---|---| +| `{"type": "stream_start"}` | Before any agent output begins | +| `{"type": "stream_end", "content": "...", "context_tokens": N, "max_context_tokens": N}` | After final text, before workers | +| `{"type": "stream_stopped"}` | If the user stopped generation | +| `{"type": "error", "message": "..."}` | On any unhandled error | + +### Thinking (reasoning) + +| Frame | When | +|---|---| +| `{"type": "thinking_delta", "delta": "..."}` | Reasoning chunk during streaming | +| `{"type": "thinking_end"}` | Reasoning phase complete | +| `{"type": "turn_thinking", "thinking": "...", "is_subagent": bool}` | Full reasoning block from a tool-calling turn (complete(), non-streaming) | + +Thinking blocks are collapsible in the UI: open during reasoning, auto-collapsed on `thinking_end`. + +### Planning + +| Frame | When | +|---|---| +| `{"type": "plan_ready", "plan": "..."}` | Before tool-calling loop if `planning_enabled` and a plan was generated | + +Rendered as a collapsible plan card in the UI. + +### Tool calls + +| Frame | When | +|---|---| +| `{"type": "tool_started", "tool": "name", "args": {...}, "is_subagent": bool}` | Immediately when a tool call begins (before execution) | +| `{"type": "tool_call", "tool": "name", "args": {...}, "result": "...", "success": bool, "is_subagent": bool}` | When the tool finishes | + +`is_subagent: true` indicates the tool call was made by a nested subagent, not the top-level agent. + +### Text output + +| Frame | When | +|---|---| +| `{"type": "stream_delta", "delta": "..."}` | Text chunk of the final response | + +### Other events + +| Frame | When | +|---|---| +| `{"type": "context_compressed", "messages_before": N, "messages_after": N}` | After context compression runs | +| `{"type": "profile_switched", "profile_id": "...", "profile_name": "..."}` | When `switch_profile` tool succeeds | + +--- + +## Stopping generation + +`POST /sessions/{session_id}/stop` + +Sets `_AgentRun.stop_event`. The agent checks this event: +- Before each LLM call +- During streaming (breaks out, calls `aclose()` on the generator) +- After tool execution + +The client sends this via `fetch()`, not over the WebSocket, to avoid corrupting the WebSocket receive state. + +Response: `{"ok": true}` if a run was active, `{"ok": false, "reason": "no active run"}` otherwise. + +--- + +## Reconnection + +If the client reconnects to an in-progress run (e.g. page reload mid-stream), `websocket_session()` detects an existing `_AgentRun` in `_runs` and subscribes a new queue to it. The client resumes receiving events from that point forward. + +--- + +## Run state management + +`_runs: dict[str, _AgentRun]` — global dict of active runs, keyed by session ID. + +`_AgentRun` holds: +- `task: asyncio.Task` — the running agent task +- `stop_event: asyncio.Event` — cooperative stop signal +- `subscribers: list[Queue]` — one queue per connected WebSocket client + +Events are broadcast to all subscribers. When the run finishes, `_runs.pop(session_id)` is called from the `finally` block.