diff --git a/NAVI.md b/NAVI.md new file mode 100644 index 0000000..c3a9d19 --- /dev/null +++ b/NAVI.md @@ -0,0 +1,60 @@ +# NAVI — Project Context + +Personal modular AI agent system. FastAPI backend + Ollama LLM + Vue webclient. + +## Server + +```bash +.venv/bin/uvicorn navi.main:app --reload --reload-dir navi --port 8000 +``` + +- UI: `http://localhost:8000` +- Debug panel: `http://localhost:8000/debug` +- Default model: `gemma4:26b-a4b-it-q4_K_M` (26B, Q4) + +## Key paths + +| Path | What | +|---|---| +| `navi/core/agent.py` | Agent loop, planning, tool execution | +| `navi/profiles/` | Profile definitions (`secretary`, `server_admin`, `developer`) | +| `navi/api/websocket.py` | WebSocket handler + event replay | +| `tools/` | User tools (auto-loaded at startup) | +| `tools/enabled.json` | Tools enabled across all profiles | +| `persona.txt` | Global persona injected into every profile | +| `navi.db` | SQLite session store | +| `workspace/` | Persistent working files | +| `manuals/` | Tool manuals (served by `tool_manual`) | + +## Documentation + +Detailed reference is in `docs/`. Query a specific file when you need depth: + +``` +filesystem(action="query", path="docs/.md", question="your question") +``` + +| File | Covers | +|---|---| +| `docs/agent.md` | Agent loop, 3-phase planning, thinking mechanics flags | +| `docs/profiles.md` | Profile fields, all config flags, how to add a profile | +| `docs/tools.md` | Built-in tools, user tool format, hot-reload | +| `docs/sessions.md` | Session model, dual-buffer, context compression | +| `docs/websocket.md` | WebSocket protocol, all event types | +| `docs/memory.md` | Long-term memory system | +| `docs/api.md` | REST API endpoints | +| `docs/config.md` | All `.env` variables | + +## Tool manuals + +For detailed usage of any tool: + +``` +tool_manual("tool_name") +``` + +Manuals exist for: `write_tool`, `spawn_agent`, `reflect`, `gmail`, `share_file`. + +## Extending Navi + +To add a new tool: `tool_manual("write_tool")` — full format reference + working example. diff --git a/docs/agent.md b/docs/agent.md index a1db0b8..2860730 100644 --- a/docs/agent.md +++ b/docs/agent.md @@ -1,36 +1,66 @@ # Agent Loop -The agent loop is the core execution engine. File: `navi/core/agent.py`. +Core execution engine. File: `navi/core/agent.py`. -## Three entry points - -### `run(session_id, user_message)` → `str` -Non-streaming. Runs the full tool-calling loop and returns the final text. Used for REST endpoints or background tasks where streaming is not needed. No planning phase. +## Entry points ### `run_stream(session_id, user_message)` → `AsyncGenerator[AgentEvent]` -Streaming. Yields `AgentEvent` objects in real time. Used by the WebSocket handler. Includes planning phase. +Streaming. Yields `AgentEvent` objects in real time. Used by the WebSocket handler. Runs the planning phase if `profile.planning_enabled = True`. + +### `run(session_id, user_message)` → `str` +Non-streaming. Full tool-calling loop, returns final text. No planning phase. ### `run_ephemeral(user_message, profile_id)` → `str` -Non-persistent subagent. No DB reads/writes. Uses a temporary in-memory context. Called by `SpawnAgentTool`. Assigns a unique session ID (`subagent_`) to isolate its scratchpad from the parent and from other subagents. +Non-persistent subagent. No DB reads/writes. Temporary in-memory context. Called by `SpawnAgentTool`. Uses session ID `subagent_` to isolate scratchpad. --- ## Planning phase (`_run_planning`) -Runs only when `profile.planning_enabled = True`, before the tool-calling loop. +Runs before the tool loop when `profile.planning_enabled = True`. -**What it does:** -1. Sends the user request to the LLM with a special system prompt: "decide if this needs a plan". -2. LLM either responds `DIRECT` (skip planning) or produces a numbered step list. -3. If a real plan is returned, it's injected into `session.context` as an assistant message — the model then sees it as its own prior statement and naturally continues from it. -4. Yields `PlanReady(plan)` event → rendered as a collapsible card in the UI. +### Phase 1 — Analysis +LLM receives the user request with a classification prompt. Outputs: +- `DIRECT` → skip planning entirely (simple request). +- A structured analysis + `REFLECT: yes/no` → continue to Phase 2 or 3. -**Detection logic:** -- Response starts with `DIRECT` → skip (no plan needed). -- No numbered steps found (regex `^\s*\d+[\.\)]`) → skip (malformed response). -- Otherwise → inject plan, emit `PlanReady`. +### Phase 2 — Multi-perspective review (conditional) +Runs only when `profile.planning_reflect_enabled = True` AND Phase 1 outputs `REFLECT: yes`. +Three advisors run in parallel (via `asyncio.gather`), each receiving the full chat context and the Phase 1 analysis: +- **Critic** — what could go wrong +- **Pragmatist** — simpler/more direct path +- **Detailer** — missing requirements -**Parameters:** `think=False`, `temperature=0.3`, no tools → fast and structured. +Advisor feedback is embedded into the Phase 3 prompt. + +### Phase 3 — Execution plan +LLM produces a numbered step list. Each step is assigned an executor: +- `TOOL: tool_name` — single tool call +- `AGENT: profile_id` — delegated to a subagent via `spawn_agent` +- `SELF` — handled inline (synthesis, context-dependent action) + +**Comma test (enforced in prompt):** if a step description lists multiple things with "and" or commas, each item must be a separate step. + +The plan is injected into `session.context` as an assistant message and saved to `session.messages` with `is_plan=True` for UI rendering. The todo list is auto-populated from the plan steps. + +--- + +## Thinking mechanics + +All flags live on `AgentProfile` and can be set per-profile in `config.json`. + +| Flag | Default | What it does | +|---|---|---| +| `think_enabled` | `true` | Passes `think=True` to LLM on every main-loop call (extended reasoning) | +| `iteration_budget_enabled` | `true` | Injects remaining iteration count into context so model wraps up in time | +| `planning_reflect_enabled` | `false` | Enables Phase 2 advisor review (3 parallel LLM calls, adds latency) | +| `goal_anchoring_enabled` | `true` | Injects goal-reminder system message every N iterations | +| `goal_anchoring_interval` | `5` | N for goal anchoring | +| `anti_stall_enabled` | `true` | Detects looping without todo progress and injects a warning | +| `anti_stall_threshold` | `8` | Consecutive iterations without progress before warning fires | +| `step_validation_enabled` | `false` | Blocks marking a todo step `done` without a `validation` field | +| `adaptive_replan_enabled` | `false` | When a step is marked `failed`, queues a re-plan prompt for the next iteration | +| `subagent_planning_enabled` | `false` | Subagents run their own planning phase | --- @@ -39,72 +69,41 @@ Runs up to `profile.max_iterations` times. ``` -iteration: - 1. Check stop_event → yield StreamStopped and return if set - 2. Call llm.stream_complete(context, tool_schemas) - - Yields ThinkingDelta events during reasoning - - Yields TextDelta events during text generation - - Final chunk carries tool_calls or finish_reason="stop" - 3a. finish_reason == "stop" (no tool calls): - → Save session, yield StreamEnd - → Run post-turn workers (e.g. context compression) - → Return - 3b. tool_calls present: - For each tool call: - - yield ToolStarted (pending card in UI) - - Create asyncio.Task for tool execution - - Set current_event_sink to a fresh Queue - - Drain the queue (receives subagent events in real time) - - yield ToolEvent (completed card in UI) - - Append tool result to session.context - Check if profile switched → reload profile + tools - Continue to next iteration +Each iteration: + 1. Check stop_event → yield StreamStopped if set + 2. Build context: _build_context() injects iteration budget and goal anchor (if due) + 3. Check anti-stall: if stalled, append warning message to context + 4. Inject queued adaptive re-plan message (if a step failed last iteration) + 5. llm.stream_complete(context, tool_schemas) + → ThinkingDelta/ThinkingEnd events during reasoning + → TextDelta events during text generation + 6a. No tool calls → save session, yield StreamEnd, run workers, return + 6b. Tool calls → execute each, yield ToolEvent, append results to context + 7. Update anti-stall counters, detect newly-failed todo steps + 8. Check if profile switched → reload profile + tools ``` ### Sub-agent event forwarding - -When a tool (e.g. `spawn_agent`) runs a subagent internally, subagent events arrive through `current_event_sink`. The parent agent drains that queue while the tool task runs, yielding subagent `ToolStarted`/`ToolEvent` events marked with `is_subagent=True`. +When `spawn_agent` runs a subagent, its events arrive through `current_event_sink`. The parent drains the queue in real time, yielding subagent events marked with `is_subagent=True`. ### Cooperative stop - -Stop is signalled via `current_stop_event` (an `asyncio.Event`). The agent checks it: -- Before each LLM call -- During streaming (breaks out of the stream loop → calls `aclose()` on generator → Ollama closes gracefully, model stays in VRAM) -- After tool execution - -**Never use `task.cancel()`** for stopping — it corrupts Starlette's WebSocket state. +Stop is signalled via `current_stop_event` (an `asyncio.Event`). Checked before each LLM call, during streaming, and after tool execution. Never use `task.cancel()` — it corrupts WebSocket state. --- -## Workers (`_run_workers`) +## Workers -Workers run sequentially after `StreamEnd`. Each receives a `WorkerContext` with session state, token counts, and LLM access. +Run sequentially after `StreamEnd`. Currently: `CompressionWorker`. -Currently registered worker: `CompressionWorker` (`navi/workers/compressor.py`). - -Worker result: `WorkerResult.events` — list of `AgentEvent` objects that are yielded after `StreamEnd`. - -Pre-turn compression also exists: before calling the LLM, `run_stream()` checks if `session.context_token_count` is over the threshold and compresses proactively. - -See [`sessions.md`](sessions.md) for compression details. +Pre-turn compression also runs at the start of `run_stream()` if `session.context_token_count` exceeds the threshold. See [`sessions.md`](sessions.md). --- -## System prompt construction +## System prompt construction (`_build_context`) -Each LLM call uses `_build_context()`, which injects: -1. System message: `persona + "---" + profile.system_prompt` (built fresh every call, never stored in session.context). -2. Optional memory message: `"## What I remember about the user\n\n{summary}"`. -3. Conversation messages from `session.context` (system messages stripped to avoid duplication). +Every LLM call receives: +1. System message: `persona + "---" + profile.system_prompt` (injected fresh, never stored). +2. Optional memory message: `"## What I remember about the user\n..."`. +3. `session.context` messages (system messages stripped to avoid duplication). -This means profile switches and persona changes take effect immediately without modifying stored history. - ---- - -## Context vars set by Agent - -Before each `run_stream()` call: `current_session_id.set(session_id)`. -Before each tool task: `current_event_sink.set(sink_queue)`. -`run_ephemeral()` sets `current_session_id` to a unique subagent ID. - -See [`architecture.md`](architecture.md) for the full ContextVar table. +Profile switches and persona changes take effect immediately. diff --git a/docs/profiles.md b/docs/profiles.md index 027eb74..a73ca4c 100644 --- a/docs/profiles.md +++ b/docs/profiles.md @@ -4,114 +4,112 @@ ## Profile definition (`navi/profiles/base.py`) +Each profile is loaded from a directory under `navi/profiles//`: +- `config.json` — all fields below +- `system_prompt.txt` — domain-specific instructions +- `subagent_system_prompt.txt` — injected into subagents spawned from this profile (optional) + ```python @dataclass class AgentProfile: - id: str # unique identifier (used in API, sessions, switch_profile) - name: str # human-readable name - description: str # shown in profile selector - system_prompt: str # domain-specific instructions - enabled_tools: list[str] # tool names available to this profile - model: str = "..." # Ollama model to use + id: str # unique identifier + name: str + description: str + system_prompt: str # loaded from system_prompt.txt + enabled_tools: list[str] # tools available in the main loop + llm_backend: str = "ollama" + model: str = "gemma4:26b-a4b-it-q4_K_M" + max_iterations: int = 10 temperature: float = 0.7 - max_iterations: int = 50 - planning_enabled: bool = False # whether to run the planning phase before the loop - llm_backend: str = "ollama" # backend key in BackendRegistry + planning_enabled: bool = False + short_description: str = "" # 1-line summary shown to all profiles + full_description: dict = {} # keys: specialization, when_to_use, key_tools + + # Thinking mechanics (see docs/agent.md for details) + think_enabled: bool = True + iteration_budget_enabled: bool = True + planning_reflect_enabled: bool = False + goal_anchoring_enabled: bool = True + goal_anchoring_interval: int = 5 + anti_stall_enabled: bool = True + anti_stall_threshold: int = 8 + step_validation_enabled: bool = False + adaptive_replan_enabled: bool = False + + # Sub-agent configuration + subagent_tools: list[str] = [] + subagent_planning_enabled: bool = False + subagent_system_prompt: str = "" # loaded from subagent_system_prompt.txt ``` -## Built-in profiles - -| Profile ID | Name | Model | Temperature | Planning | -|---|---|---|---|---| -| `secretary` | Personal Secretary | gemma4:26b-a4b-it-q4_K_M | 0.7 | Yes | -| `server_admin` | Server Administrator | gemma4:26b-a4b-it-q4_K_M | 0.2 | Yes | -| `smart_home` | Smart Home Assistant | gemma4:26b-a4b-it-q4_K_M | 0.3 | Yes | - -All profiles have the same base tool set: -``` -todo, scratchpad, switch_profile, -web_search, web_view, http_request, -filesystem, code_exec, terminal, ssh_exec, image_view, -memory_search, memory_forget, -reload_tools, write_tool, list_tools, tool_manual, -spawn_agent -``` - -User tools from `tools/enabled.json` are merged in on top of the profile's `enabled_tools` list. - -## System prompt construction - -The final system prompt the LLM sees is: - -``` -{NAVI_PERSONA} - --- -{profile.system_prompt} +## Active profiles + +| ID | Name | Model | Temp | Planning | +|---|---|---|---|---| +| `secretary` | Personal Secretary | gemma4:26b-a4b-it-q4_K_M | 0.7 | Yes | +| `server_admin` | Server Administrator | gemma4:26b-a4b-it-q4_K_M | 0.2 | Yes | +| `developer` | Tool Developer | gemma4:26b-a4b-it-q4_K_M | 0.2 | Yes | + +All profiles share a base tool set. User tools from `tools/enabled.json` are merged in at runtime. + +--- + +## System prompt construction + +The LLM sees (injected fresh on every call, never stored in session): + +``` +{persona.txt content} + +--- + +{profile system_prompt.txt content} ``` -`NAVI_PERSONA` is the global personality layer — loaded from `settings.navi_persona` or `settings.navi_persona_file`. It contains: personality, self-extension rules, planning/scratchpad instructions, delegation guidance, memory rules. +`persona.txt` — global layer: personality, CONTEXT FIRST principle, self-extension rules, scratchpad/todo/memory instructions, delegation rules. -`profile.system_prompt` is the domain-specific layer: tool priorities, workflow rules, scratchpad section names, safety rules for that domain. +`system_prompt.txt` — domain layer: tool priorities, workflow, safety rules for this profile. -The system message is **never stored** in `session.context`. It is injected fresh on every LLM call. Profile switches take effect immediately. +--- -## Adding a new profile +## Adding a profile -1. Create `navi/profiles/my_profile.py`: -```python -from .base import AgentProfile - -my_profile = AgentProfile( - id="my_profile", - name="My Profile", - description="What this profile is for.", - system_prompt="""Mode: ... - -## Tool priorities -1. web_search — ... -2. filesystem — ... - -## Scratchpad sections -- findings, errors, plan - -## Safety rules -...""", - enabled_tools=[ - "todo", "scratchpad", "switch_profile", - "web_search", "filesystem", "code_exec", - "memory_search", "memory_forget", - "reload_tools", "write_tool", "list_tools", "tool_manual", - "spawn_agent", - ], - model="gemma4:26b-a4b-it-q4_K_M", - temperature=0.5, - planning_enabled=True, -) +1. Create directory `navi/profiles/my_profile/` +2. Add `config.json`: +```json +{ + "id": "my_profile", + "name": "My Profile", + "description": "...", + "short_description": "...", + "model": "gemma4:26b-a4b-it-q4_K_M", + "temperature": 0.5, + "max_iterations": 30, + "planning_enabled": true, + "think_enabled": true, + "iteration_budget_enabled": true, + "planning_reflect_enabled": false, + "goal_anchoring_enabled": true, + "goal_anchoring_interval": 5, + "anti_stall_enabled": true, + "anti_stall_threshold": 8, + "step_validation_enabled": false, + "adaptive_replan_enabled": false, + "subagent_planning_enabled": false, + "subagent_tools": ["todo", "filesystem", "terminal"], + "enabled_tools": ["todo", "scratchpad", "web_search", "filesystem"] +} ``` +3. Add `system_prompt.txt` with domain-specific instructions. +4. Optionally add `subagent_system_prompt.txt`. +5. The profile is auto-discovered at startup — no registration needed. -2. Register it in `navi/profiles/__init__.py`: -```python -from .my_profile import my_profile -ALL_PROFILES = [..., my_profile] -``` +--- ## Profile switching -`switch_profile` tool updates `session.profile_id` in the DB. In `run_stream()`, after each tool execution batch, the agent checks if `session.profile_id` changed and reloads profile + tools. The switch takes full effect on the next LLM call within the same run. +`switch_profile` tool updates `session.profile_id` in the DB. After each tool execution batch, `run_stream()` checks for a profile change and reloads profile + tools. Takes effect on the next LLM call. -Rules from the persona: -- Don't switch for a single off-topic question. -- Switch when the session domain clearly changes (e.g. coding → server admin). -- Never switch back and forth repeatedly in one conversation. - -## Per-profile scratchpad sections - -Each profile defines named sections appropriate for its domain: - -| Profile | Scratchpad sections | -|---|---| -| secretary | `findings`, `sources`, `drafts` | -| server_admin | `status`, `logs`, `errors`, `plan` | -| smart_home | `state`, `config`, `errors` | +Rules (in persona): don't switch for a single off-topic question; switch when the domain clearly changes; never switch back and forth repeatedly.