diff --git a/CLAUDE.md b/CLAUDE.md index b44a252..593d854 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -157,8 +157,6 @@ | `docs/android-client.md` | Android app — architecture, build/deploy, WebView config, platform detection | | `docs/architecture.md` | Component diagram, data flow, registry wiring | -`NAVI.md` (project root) is a lightweight hub with the server command, key paths table, and a `filesystem(action="query")` pattern for querying docs at runtime. - ## What works well - Hot-reload without server restart - Thinking display in client diff --git a/NAVI.md b/NAVI.md deleted file mode 100644 index 7815fb8..0000000 --- a/NAVI.md +++ /dev/null @@ -1,72 +0,0 @@ -# NAVI — Quick Reference - -Personal modular AI agent system. FastAPI backend + Ollama LLM + webclient. - -## Server - -```bash -.venv/bin/uvicorn navi.main:app --reload --reload-dir navi --port 8000 -``` - -- UI: `http://localhost:8000` -- Debug panel: `http://localhost:8000/debug` - -## Key paths - -| Path | What | -|---|---| -| `navi/core/agent.py` | Agent loop, planning, tool execution | -| `navi/profiles/` | Profile definitions (`secretary`, `server_admin`, `developer`) | -| `navi/api/websocket.py` | WebSocket handler + event replay | -| `tools/` | User tools (auto-loaded at startup) | -| `tools/enabled.json` | Tools enabled across all profiles | -| `persona.txt` | Global persona injected into every profile | -| `DATABASE_URL` | PostgreSQL session + memory store | -| `workspace/` | Persistent private working files for Navi | -| `session_files/{session_id}/` | Per-session uploads and publishable chat artifacts (`SESSION_FILES_DIR`) | -| `manuals/` | Tool manuals (served by `tool_manual`) | - -## Documentation - -Detailed reference is in `docs/`. Query a specific file when you need depth: - -| File | Covers | -|---|---| -| `docs/agent.md` | Agent loop, 3-phase planning, thinking mechanics flags | -| `docs/profiles.md` | Profile fields, all config flags, how to add a profile | -| `docs/tools.md` | Built-in tools, user tool format, hot-reload | -| `docs/sessions.md` | Session model, dual-buffer, context compression | -| `docs/websocket.md` | WebSocket protocol, all event types, reconnect replay | -| `docs/memory.md` | Long-term memory system | -| `docs/api.md` | Full REST + WebSocket API reference with schemas | -| `docs/config.md` | All `.env` variables | -| `docs/architecture.md` | Component diagram, data flow, registry wiring | - -## Tool manuals - -For detailed usage of any tool: - -Call the `tool_manual` tool with the relevant tool name. - -Manuals exist for: `write_tool`, `spawn_agent`, `reflect`, `gmail`, `share_file`, `content_publish`. - -## Extending Navi - -To add a new tool: call the `tool_manual` tool for `write_tool` — full format reference + working example. - -## Agent prompt conventions - -- Keep the main agent context small: delegate bounded subtasks that need 3+ tool calls and can be summarized independently. -- Before asking the user for project facts, proactively check nearest `NAVI.md`, then `docs/`, `manuals/`, memory, files, tool schemas, or web sources. -- For Navi internals, use `docs/index.md` as the documentation map before scanning broad source trees. -- File-location rule: `workspace/` is persistent/private working storage; `session_files/{session_id}/` is the per-session area for uploads, downloads, and inline artifacts. `share_file` copies an existing local file into the session directory and returns a download link. `content_publish` registers an existing session file for inline viewing and does not copy from `workspace/`. - -## Active 3D modeling improvements - -Near-term work for `modeler_3d`: - -- Convert abstract user requests into an internal technical specification before detailed planning or OpenSCAD generation. -- Make 3D implementation plans more detailed: object class, dimensions, print orientation, interfaces, tolerances, support strategy, geometry modules, verification steps. -- Use a fixed preview inspection checklist after `image_view`, comparing the rendered model against the technical specification. -- Require a critique/revision pass for medium+ 3D tasks before publishing, unless the model is a trivial primitive and the no-change rationale is explicit. -- Add a source comments contract to generated `.scad` files: purpose, units, assumptions, print orientation, key dimensions, parameters, and revision notes. diff --git a/manuals/reflect.md b/manuals/reflect.md index a455921..7007ef4 100644 --- a/manuals/reflect.md +++ b/manuals/reflect.md @@ -42,7 +42,7 @@ ### When NOT to use `reflect` - **Simple Tasks:** Do not use it for single-step operations. - **Routine Actions:** Do not use it for standard file reads, writes, or simple searches. -- **Before Basic Fact Gathering:** Do not use it as a substitute for checking `NAVI.md`, docs, files, tool schemas, or obvious command output. +- **Before Basic Fact Gathering:** Do not use it as a substitute for checking connected MCP knowledge servers, docs, files, tool schemas, or obvious command output. ### The Importance of `assumptions` The most critical input to the `reflect` tool is the `assumptions` parameter. @@ -53,9 +53,9 @@ ## Autonomy Rule If reflection finds missing information, resolve it through available sources first: -- nearest `NAVI.md` +- connected MCP knowledge servers exposed by the active profile - project docs or manuals -- memory +- memory for personal user facts - source files and tool schemas - command output - web research when current external facts matter @@ -70,5 +70,5 @@ **Output:** - **Critic:** "You haven't mentioned how you will handle credential rotation or whether the SSH key is protected by a passphrase." -- **Pragmatist:** "Instead of a full automation script, use the existing deployment command if the project docs or NAVI.md define one." +- **Pragmatist:** "Instead of a full automation script, use the existing deployment command if the project docs define one." - **Detailer:** "Check whether the remote user has Docker permissions and whether the required network port is open before changing deployment logic." diff --git a/mcp_servers.json b/mcp_servers.json index ac4f810..c74143e 100644 --- a/mcp_servers.json +++ b/mcp_servers.json @@ -26,6 +26,6 @@ "list_pending_changes" ] }, - "instructions": "MANDATORY: Before answering ANY question about infrastructure, servers, services, networks, documentation, or system inventory — you MUST call gnexus-book tools first.\n\nQuery mapping:\n- 'server X not working / status' → search_docs, get_inventory_item\n- 'what services run where' → list_inventory, get_relationships\n- 'update docs / fix documentation' → read_doc, propose_doc_change, commit_changes\n- 'is X up to date / freshness' → check_freshness\n- 'validate repository / repo state' → validate_repository, git_status\n\nDo NOT rely on memory or NAVI.md for infrastructure facts — they may be stale. Always pull current state from gnexus-book.\n\nAlways validate the repository before making changes. Do not store raw secrets in documentation.\n\nWhen you discover new infrastructure facts, service configurations, or relationships during tool execution, consider whether they should be persisted. For stable facts about servers, services, or network topology, use gnexus-book write tools (propose_doc_change, propose_inventory_item_change). For user-specific facts, use the memory tool instead. Choose the target based on scope, not habit." + "instructions": "MANDATORY for profiles that expose gnexus-book tools: Before answering any question about infrastructure, servers, services, networks, documentation, or system inventory, call gnexus-book tools first. In Navi, gnexus-book tools are exposed with the mcp_gnexus-book_ prefix; use the exact tool names from the current tool schema.\n\nQuery mapping:\n- 'server X not working / status' → mcp_gnexus-book_search_docs, mcp_gnexus-book_get_inventory_item\n- 'what services run where' → mcp_gnexus-book_list_inventory, mcp_gnexus-book_get_relationships\n- 'update docs / fix documentation' → mcp_gnexus-book_read_doc, mcp_gnexus-book_propose_doc_change, mcp_gnexus-book_commit_changes\n- 'is X up to date / freshness' → mcp_gnexus-book_check_freshness\n- 'validate repository / repo state' → mcp_gnexus-book_validate_repository, mcp_gnexus-book_git_status\n\nDo not rely on memory for infrastructure facts. Memory is only for personal user facts and preferences. Always pull infrastructure state from gnexus-book when these tools are available to the active profile.\n\nAlways validate the repository before making changes. Do not store raw secrets in documentation.\n\nBefore the final response, decide whether tool execution revealed stable reusable infrastructure facts, service configurations, or relationships. If yes, persist them with gnexus-book write tools (mcp_gnexus-book_propose_doc_change, mcp_gnexus-book_propose_inventory_item_change) before answering. If the fact is user-specific rather than infrastructure documentation, use the memory tool instead. Choose the target based on scope, not habit." } } diff --git a/navi/core/agent.py b/navi/core/agent.py index 27dc2b7..a7ef200 100644 --- a/navi/core/agent.py +++ b/navi/core/agent.py @@ -479,7 +479,7 @@ built_ctx: list[Message] = [subagent_sys_msg] if mem: built_ctx.append(mem) - mcp_msg = self._ctx_builder._mcp_context_msg() + mcp_msg = self._ctx_builder._mcp_context_msg(profile) if mcp_msg: built_ctx.append(mcp_msg) built_ctx.extend(m for m in context if m.role != "system") diff --git a/navi/core/context_builder.py b/navi/core/context_builder.py index 06e2bd3..dfbd1ce 100644 --- a/navi/core/context_builder.py +++ b/navi/core/context_builder.py @@ -241,7 +241,7 @@ lines.append(f"Role: {_role_var.get()}") return Message(role="system", content="\n".join(lines)) - def _mcp_context_msg(self) -> "Message | None": + def _mcp_context_msg(self, profile: "AgentProfile | None" = None) -> "Message | None": """Build a system message with MCP server instructions. Combines server-provided instructions (from MCP initialize handshake) @@ -249,7 +249,10 @@ """ if not self._mcp_manager: return None - instructions = self._mcp_manager.get_instructions() + if profile is not None and not profile.mcp_servers: + return None + server_names = set(profile.mcp_servers.keys()) if profile is not None else None + instructions = self._mcp_manager.get_instructions(server_names) if not instructions: return None lines = ["[MCP servers — external knowledge sources]"] @@ -295,7 +298,7 @@ result.append(policy) # Inject MCP server instructions into context - mcp_msg = self._mcp_context_msg() + mcp_msg = self._mcp_context_msg(profile) if mcp_msg: result.append(mcp_msg) diff --git a/navi/core/planning.py b/navi/core/planning.py index 2c74b45..9395f1a 100644 --- a/navi/core/planning.py +++ b/navi/core/planning.py @@ -84,6 +84,9 @@ _dbg: dict = {"timestamp": datetime.now(timezone.utc).isoformat(), "result": "plan", "phases": {}} _base_sys = system_prompt_override if system_prompt_override is not None else self._ctx_builder.build_system_prompt(profile) + _mcp_msg = self._ctx_builder._mcp_context_msg(profile) + if _mcp_msg: + _base_sys = _base_sys + "\n\n---\n\n" + (_mcp_msg.content or "") # ── Phase 1: Task analysis ──────────────────────────────────────────── analysis: str = "" @@ -103,21 +106,27 @@ "without tools — respond with exactly: DIRECT\n\n" ) + available_tools_block - + "Analyse the request and output:\n\n" + + "Knowledge store rules (critical):\n" + "- `memory` is only for personal user facts and preferences.\n" + "- Connected MCP knowledge servers are authoritative only when the active profile exposes their tools.\n" + "- If the domain is infrastructure and gnexus-book tools are available, use gnexus-book as the primary source and persistence target.\n" + "- If no relevant MCP tools are available to this profile, do not plan to call unavailable MCP tools; use docs, files, command output, web, or ask the user after checking available sources.\n" + "- Never use memory for infrastructure inventory, service topology, network routes, proxy mappings, server roles, or service relationships.\n\n" + "Analyse the request and output:\n\n" "TASK: [one clear sentence — what actually needs to be done]\n" "GOAL: [how you will know the task is complete]\n" "UNKNOWNS: [genuine uncertainties that could block execution, or NONE]\n" "RESOURCES:\n" "- [tool_name]: [what it does] — [limitation if any] — [alternative if limitation blocks the goal]\n" - "- context sources: [which of memory / NAVI.md / web you will check and why]\n" + "- context sources: [which of connected MCP knowledge servers / memory / docs / web you will check and why]\n" "KNOWLEDGE SOURCE ASSESSMENT:\n" - "- Domain: [user personal facts / infrastructure / own capabilities / external web]\n" - "- Primary source: [memory / connected knowledge servers / NAVI.md / docs / web]\n" + "- Domain: [user personal facts / infrastructure / project documentation / own capabilities / external web]\n" + "- Primary source: [connected knowledge servers / memory / docs / web / source files / command output]\n" "- Fallback: [alternative source if primary is unavailable]\n" "KNOWLEDGE CAPTURE:\n" "- New information to save: [specific facts, conventions, or discoveries that should persist beyond this session]\n" - "- Target: [memory / NAVI.md / docs / gnexus-book / none — choose the best persistent store]\n" - "- Duplication check: [search memory first for: ]\n" + "- Target: [memory / connected knowledge server / docs / none — choose the best persistent store available to this profile]\n" + "- Duplication check: [which target-specific search/read/list step prevents duplicates]\n" "- Rationale: [why this knowledge is stable and reusable]\n" "COMPLEXITY: simple | medium | complex — choose based on ambiguity, number of files/systems, risk, and autonomy needed.\n" "SUBTASKS:\n" @@ -129,7 +138,7 @@ "REFLECT: yes — if the task is complex (multiple unknowns, external APIs, " "research required, or high-stakes/irreversible actions); " "no — if it is straightforward and the path is clear.\n" - "COMMITMENTS: [follow the plan step by step using the todo tool; gather any missing context independently without asking the user]\n\n" + "COMMITMENTS: [follow the plan step by step using the todo tool; gather missing context independently before asking the user; before the final answer, run a knowledge persistence checkpoint]\n\n" "Rules: list enough subtasks to make execution unambiguous. " "Simple tasks usually need 1-3 subtasks; medium tasks 5-9; complex or autonomous tasks 8-15. " "Hard maximum: 15 subtasks. Each must be concrete and actionable. " @@ -200,8 +209,11 @@ "[PLANNING - PHASE 2: STRUCTURED REVIEW]\n\n" "Review the phase 1 task analysis before execution. " "Do not change the user's goal. Do not invent facts. " - "Prefer resolving missing information through NAVI.md, docs, manuals, memory, files, " - "tool schemas, command output, or web research before asking the user.\n\n" + "Prefer resolving missing information through connected knowledge servers, docs, manuals, memory, files, " + "tool schemas, command output, or web research before asking the user. " + "Check that the proposed knowledge source and persistence target match the fact scope: " + "memory only for personal user facts; connected MCP knowledge servers for their own canonical domains; " + "docs/manuals for project-wide documentation. Flag any plan that stores infrastructure facts in memory.\n\n" "Return exactly these sections:\n\n" "## Critic\n" "- 3-5 bullets: wrong or unverified assumptions, ignored risks, contradictions, " @@ -214,7 +226,7 @@ "and validation steps.\n\n" "## Plan Adjustments\n" "- Concrete changes Phase 3 must apply: add/remove/split/merge/reorder steps, " - "change TOOL/AGENT/SELF executor, verify specific facts, or defer user questions " + "change TOOL/AGENT/SELF executor, verify specific facts, correct the persistence target, add a knowledge persistence checkpoint, or defer user questions " "until available sources are checked.\n\n" "Keep output concise. No prose outside these sections.\n\n" f"PHASE 1 ANALYSIS:\n{analysis}" @@ -298,11 +310,20 @@ "- complex or autonomous: 8-15 steps\n" "- hard maximum: 15 steps\n" "Use enough steps to make execution unambiguous. Do not compress unrelated actions into one step.\n\n" - "For every non-trivial task, include steps for information gathering from project notes/docs/files/tool schemas, " - "implementation or analysis, verification, final synthesis, and knowledge persistence when stable reusable facts are discovered. " - "Choose the persistence target based on the fact's scope: memory tool for user-scoped facts, NAVI.md for project/directory conventions, " - "docs/ or manuals/ for project-wide documentation, gnexus-book for infrastructure inventory, or filesystem for standalone files. " - "Always search the target first to avoid duplicates.\n\n" + "Knowledge source and persistence rules (critical):\n" + "- `memory` is only for personal user facts and preferences.\n" + "- Never store infrastructure inventory, service topology, network routes, proxy mappings, server roles, or service relationships in memory.\n" + "- Connected MCP knowledge servers are authoritative only when the active profile exposes their tools. Do not plan unavailable MCP tool calls.\n" + "- If the task domain is infrastructure and gnexus-book tools are available, include a gnexus-book read/search step before answering or changing anything.\n" + "- If execution may discover durable facts, include a dedicated knowledge persistence checkpoint before final synthesis.\n\n" + "For every non-trivial task, include steps for information gathering from connected knowledge servers/docs/files/tool schemas, " + "implementation or analysis, verification, knowledge persistence checkpoint, and final synthesis. " + "Choose the persistence target based on the fact's scope and the active profile's available tools: " + "memory tool for personal user facts and preferences only; connected knowledge servers for their own canonical domains " + "(for example gnexus-book for infrastructure inventory when its MCP tools are available); docs/ or manuals/ for project-wide documentation; " + "filesystem for standalone files. " + "Always search/read/list the selected target first to avoid duplicates. " + "The checkpoint can be SELF only when the task could not have discovered durable reusable facts; otherwise assign it to the exact persistence/search tool that will be needed later.\n\n" "AGENT scoping rules (critical):\n" "- Each AGENT step is one focused, independently verifiable unit of work.\n" "- One AGENT step = one spawn_agent call later. Do NOT bundle multiple concerns.\n" @@ -322,7 +343,8 @@ "1. [description] → TOOL: tool_name\n" "2. [description] → AGENT: profile_id\n" "3. [description] → AGENT: profile_id\n" - "4. [description] → SELF\n" + "4. Knowledge persistence checkpoint: [search/read selected target; persist stable facts if discovered, or confirm none] → TOOL: tool_name OR SELF\n" + "5. [final synthesis] → SELF\n" "... continue to the needed depth, up to 15 steps\n\n" "**Parallel:** [step numbers that can run simultaneously, or NONE]\n" "**Risks:** [unknowns to watch for, or NONE]\n\n" diff --git a/navi/mcp/manager.py b/navi/mcp/manager.py index e52af03..dd6a6e7 100644 --- a/navi/mcp/manager.py +++ b/navi/mcp/manager.py @@ -98,18 +98,23 @@ return [] return list(cfg.groups.get(group_name, [])) - def get_instructions(self) -> dict[str, str]: + def get_instructions(self, server_names: list[str] | set[str] | None = None) -> dict[str, str]: """Return combined instructions for every connected server. Server-provided instructions (from MCP initialize handshake) are merged with the overlay ``instructions`` field from ``mcp_servers.json``. - If a server is disconnected, only the config overlay is returned. + If a selected server is disconnected, only the config overlay is returned. """ configs = load_mcp_servers(self.config_path) out: dict[str, str] = {} - for name, client in self._clients.items(): + if server_names is None: + names = set(self._clients.keys()) + else: + names = set(server_names) + for name in names: + client = self._clients.get(name) parts: list[str] = [] - if client.instructions: + if client and client.instructions: parts.append(client.instructions) cfg = configs.get(name) if cfg and cfg.instructions: diff --git a/navi/memory/extractor.py b/navi/memory/extractor.py index f4d135f..2170b52 100644 --- a/navi/memory/extractor.py +++ b/navi/memory/extractor.py @@ -39,7 +39,7 @@ - Temporary states ("was tired", "was busy today") - Information about third parties that isn't about the user - Directory-specific project notes, one-off commands, file paths, task progress -- NAVI.md content or local operational notes +- Infrastructure inventory, service topology, network routes, server facts, or local operational notes - Already-known facts that appear in the transcript For each fact, indicate its source: @@ -51,7 +51,7 @@ Schema: [ {"category": "profile", "key": "name", "value": "Eugene", "source": "conversation", "source_context": "user introduced themselves"}, - {"category": "technical", "key": "host_ip", "value": "192.168.1.168", "source": "tool_call", "source_context": "found via terminal ip addr"} + {"category": "preferences", "key": "prefers_dark_ui", "value": "true", "source": "conversation", "source_context": "user asked for dark UI"} ] Valid categories: profile, preferences, technical, projects, other""" @@ -63,7 +63,7 @@ Be specific and concrete. Cover the most important identifying details first, then preferences and ongoing context. Do not add facts not present below. Do not include task progress, local directory notes, -or one-off commands; those belong in NAVI.md, not user memory.""" +one-off commands, infrastructure inventory, service topology, network routes, or server facts.""" async def extract_and_update( diff --git a/navi/profiles/developer/system_prompt.txt b/navi/profiles/developer/system_prompt.txt index 93e7baf..22c0a98 100644 --- a/navi/profiles/developer/system_prompt.txt +++ b/navi/profiles/developer/system_prompt.txt @@ -49,12 +49,11 @@ ## Project knowledge Before asking the user or scanning broad code: -- Find and read/query the nearest `NAVI.md`. - Use `docs/index.md` as the map when the project has docs. - Query specific docs before reading large source files. For Navi itself, start with `docs/architecture.md`, `docs/agent.md`, `docs/tools.md`, `docs/profiles.md`, or `docs/config.md` depending on the task. - Use tool schemas and manuals as truth for tool names and parameters. -Update `NAVI.md` when you discover a stable command, convention, entry point, project decision, or local quirk that will matter in future sessions. +Update project docs or manuals when you discover a stable command, convention, entry point, project decision, or local quirk that should be preserved for future work. ## Context drift recovery diff --git a/navi/profiles/discuss/system_prompt.txt b/navi/profiles/discuss/system_prompt.txt index e348613..dc85bc7 100644 --- a/navi/profiles/discuss/system_prompt.txt +++ b/navi/profiles/discuss/system_prompt.txt @@ -20,7 +20,7 @@ Use `web_search` + `web_view` when a factual grounding would strengthen the discussion — not for every question, only when currency or precision matters. -Use the nearest `NAVI.md` and project `docs/` when discussing an active project. Prefer `docs/index.md` as the map, then query specific docs rather than rereading broad source trees. +Use project `docs/` when discussing an active project. Prefer `docs/index.md` as the map, then query specific docs rather than rereading broad source trees. Use `scratchpad` to draft complex reasoning before presenting it, especially when synthesizing multiple ideas. diff --git a/navi/profiles/secretary/system_prompt.txt b/navi/profiles/secretary/system_prompt.txt index e0031a5..94f3081 100644 --- a/navi/profiles/secretary/system_prompt.txt +++ b/navi/profiles/secretary/system_prompt.txt @@ -41,7 +41,7 @@ ### Information gathering -Before asking the user for facts, check available sources first: nearest `NAVI.md`, relevant `docs/` or `manuals/`, memory, files, web, or tool schemas. Use documentation as the project map instead of rereading the whole codebase. +Before asking the user for facts, check available sources first: connected MCP knowledge servers exposed by the active profile, relevant `docs/` or `manuals/`, memory for personal user facts, files, web, or tool schemas. Use documentation as the project map instead of rereading the whole codebase. ### Plan → execution binding The auto-generated plan assigns each step an executor (TOOL / AGENT / SELF): diff --git a/navi/profiles/server_admin/system_prompt.txt b/navi/profiles/server_admin/system_prompt.txt index dfd3edf..3d17b45 100644 --- a/navi/profiles/server_admin/system_prompt.txt +++ b/navi/profiles/server_admin/system_prompt.txt @@ -33,7 +33,9 @@ ### Information gathering -Before asking the user for host/project facts, check nearest `NAVI.md`, relevant docs, memory, known SSH host files, and local config. Use `NAVI.md` as the operational notebook for the current directory: read it before substantial work and update it with stable host details, commands, service facts, and deployment quirks. +Before asking the user for host/project facts, check the sources that are actually authoritative for the active profile: connected MCP knowledge servers, relevant docs, memory for personal user facts, known SSH host files, local config, and live command output. + +For infrastructure inventory, service topology, traffic routes, network layout, host roles, proxy mappings, and server/service relationships, use connected MCP knowledge servers when this profile exposes them. Do not store those facts in `memory`; memory is only for personal user facts and preferences. ### Execution flow 1. **Plan** — use the `todo` tool's set action with milestones. Assign executor to each: TOOL / AGENT / SELF. diff --git a/navi/profiles/tool_developer/system_prompt.txt b/navi/profiles/tool_developer/system_prompt.txt index 6afa924..96210ca 100644 --- a/navi/profiles/tool_developer/system_prompt.txt +++ b/navi/profiles/tool_developer/system_prompt.txt @@ -42,7 +42,7 @@ ## Build workflow -1. **Orient** — read/query nearest `NAVI.md`, then use `docs/index.md` as the map. For Navi tool work, check `docs/tools.md`, `manuals/write_tool.md`, and `tools/_template.py` before writing code. +1. **Orient** — use `docs/index.md` as the map. For Navi tool work, check `docs/tools.md`, `manuals/write_tool.md`, and `tools/_template.py` before writing code. 2. **Understand** — clarify what the tool does, what params it takes, where it will run, what data it may persist, and which profiles should receive it. Research first if needed; do not invent APIs. 3. **Check conflicts** — use the `filesystem` tool's list action on `tools/` to see existing tools, then inspect similar tools before copying a pattern. 4. **Write** — use the `write_tool` tool with the chosen tool name and full source code. Never use `filesystem` for initial creation — `write_tool` validates the format and registers the tool automatically. @@ -50,7 +50,7 @@ If it fails: use the `filesystem` tool's query action to locate the issue, then its smart_edit or write action to fix it, then test again. Never skip this step. 6. **Reload** — `reload_tools()` only after test_tool passes. 7. **Enable** — add tool name to `enabled_tools` in the relevant profile `config.json` files if not already added by `write_tool`. -8. **Update notes** — if you discover a stable tool convention, dependency, credential requirement, or workflow quirk, update `NAVI.md`. +8. **Update docs** — if you discover a stable tool convention, dependency, credential requirement, or workflow quirk, update the relevant project docs or manuals. 9. **Report** — what was created, what it does, which profiles it's in. --- diff --git a/navi/tools/filesystem.py b/navi/tools/filesystem.py index e2a422c..0dedabc 100644 --- a/navi/tools/filesystem.py +++ b/navi/tools/filesystem.py @@ -617,7 +617,9 @@ return ToolResult(success=True, output=header + "\n".join(lines)) def _find_up(self, path: Path, params: dict) -> ToolResult: - filename = params.get("pattern", "NAVI.md") + filename = params.get("pattern") + if not filename: + return ToolResult(success=False, output="", error="pattern is required for find_up") current = path if path.is_dir() else path.parent checked = [] while True: diff --git a/navi/tools/reflect.py b/navi/tools/reflect.py index 87c6388..6f5ff9d 100644 --- a/navi/tools/reflect.py +++ b/navi/tools/reflect.py @@ -154,7 +154,7 @@ f"{detailer}\n\n" "---\n" "Integrate these perspectives into your plan.\n" - "First resolve missing information through available sources: NAVI.md, docs, manuals, " + "First resolve missing information through available sources: connected MCP knowledge servers, docs, manuals, " "memory, files, tool schemas, command output, or web research. " "Ask the user only when the missing decision is genuinely theirs to make or cannot be " "recovered from available sources. " diff --git a/persona.txt b/persona.txt index 9dfc293..681114d 100644 --- a/persona.txt +++ b/persona.txt @@ -10,9 +10,9 @@ INFORMATION GATHERING: Before asking the user for anything — search first. The default order is: -1. NAVI.md for the current project/directory (filesystem find_up, then query/read). +1. Connected MCP knowledge servers, when the active profile exposes relevant MCP tools for the task domain. 2. docs/ or manuals/ when the task concerns project architecture, APIs, tools, profiles, config, or workflows. -3. Injected memory summary and the `memory` tool with action `search` for user/project facts that may survive across sessions. +3. Injected memory summary and the `memory` tool with action `search` for personal user facts, preferences, and long-lived user context. 4. Relevant source files, tool schemas, command output, or web research. Ask the user only after these sources do not contain the needed information or the decision is genuinely theirs to make. @@ -136,32 +136,20 @@ For tool output (terminal, file reads, API responses): synthesise by default. Include raw output verbatim only when directly relevant or explicitly requested. -NAVI.MD — PROJECT CONTEXT: -Projects and active work directories may have a NAVI.md file. Treat it as the operational notebook for that directory: what Navi is actively doing there, how to work there, known conventions, local commands, stable paths, credentials the user provided, and current project decisions. It complements long-term memory: memory stores user-wide facts; NAVI.md stores directory/project facts. +PERSISTENT KNOWLEDGE STORES: +Use the right persistent store for the fact's scope. Do not mix them. -READ — proactively look for NAVI.md with filesystem find_up before substantial work in a directory, before asking for project facts, and when resuming a project after context has grown. If several NAVI.md files exist, use the nearest one to the directory where the work happens. +1. `memory` is only for personal user facts and preferences: identity, stable preferences, recurring habits, personal devices as user context, and corrections the user explicitly gives about themselves. +2. Connected MCP knowledge servers are canonical for their own domain. If the active profile exposes MCP tools for a domain, use those tools before relying on memory, docs, or guesses for that domain. +3. Project docs and manuals are for project-wide architecture, APIs, workflows, and implementation conventions. +4. Source files and command output are the source of truth for the current working tree or live system state. -WRITE — update NAVI.md when you discover stable, reusable facts: -- A server: IP, OS, role, services, access method. -- A credential or connection string the user has shared or you've used. -- A project convention, constraint, or quirk (e.g. "deployment runs via make deploy, not npm run build"). -- Status of ongoing work worth remembering across sessions. -- A reliable local command, test, build step, entry point, or documentation map. - -HOW TO WRITE: -1. query first — check if the fact is already recorded to avoid duplicates. -2. Use the `filesystem` tool's `smart_edit` action on the NAVI.md path, targeting a specific section. Example instruction: "under ## Servers, add: 192.168.1.168 - Ubuntu 24.04, Docker, SearXNG :8088, SSH user: ubuntu". -3. One targeted smart_edit per discovery. Never rewrite the whole file. -4. If NAVI.md does not exist in the project root yet: create it with the `filesystem` tool's `write` action using this template: - # NAVI — Project Context\n\n## Environment\n\n## Notes\n - Then immediately add the discovered fact via smart_edit. - -DO NOT WRITE: task progress, session state, one-off results, anything already in your memory tool. +Do NOT use ad-hoc local notes as a competing knowledge base. Do NOT store infrastructure inventory, service topology, network routes, or server facts in `memory` unless the fact is explicitly a personal user preference rather than infrastructure documentation. LONG-TERM MEMORY: You have a persistent memory system that survives across sessions. The "What I remember about the user" block injected above is a pre-built summary — treat it as ground truth. -Use the `memory` tool with action `save` BEFORE finishing your response whenever you learned something stable: personal facts (name, location, devices, server IPs), preferences and habits, ongoing projects, corrections. Key naming: short, stable, snake_case. Save overwrites by key. +Use the `memory` tool with action `save` BEFORE finishing your response whenever you learned something stable about the user personally: name, location, preferences, habits, personal devices as user context, ongoing personal projects, and corrections. Key naming: short, stable, snake_case. Save overwrites by key. Before calling `memory save`, always call `memory search` with the key or a related query to verify the fact does not already exist. If it exists, do not save a duplicate — skip or update if the value changed. diff --git a/tests/unit/core/test_context_builder.py b/tests/unit/core/test_context_builder.py index 663d063..aae60c6 100644 --- a/tests/unit/core/test_context_builder.py +++ b/tests/unit/core/test_context_builder.py @@ -7,6 +7,17 @@ from tests.conftest_factory import make_profile, make_profile_registry +class FakeMcpManager: + def __init__(self): + self.calls = [] + + def get_instructions(self, server_names=None): + self.calls.append(server_names) + if not server_names: + return {} + return {name: f"{name} instructions" for name in server_names} + + class TestBuildSystemPrompt: def test_includes_persona(self): import navi.config as _config @@ -98,3 +109,25 @@ context = [Message(role="user", content="hi")] result = builder.build(context, profile, mem=None, iteration=9, max_iterations=10) assert "CRITICAL" in result[-1].content + + def test_does_not_inject_mcp_for_profiles_without_mcp_servers(self): + mcp = FakeMcpManager() + builder = ContextBuilder(profile_registry=make_profile_registry(), mcp_manager=mcp) + profile = make_profile("test", mcp_servers={}) + context = [Message(role="user", content="hi")] + + result = builder.build(context, profile, mem=None) + + assert not any("MCP servers" in (m.content or "") for m in result) + assert mcp.calls == [] + + def test_injects_only_profile_mcp_server_instructions(self): + mcp = FakeMcpManager() + builder = ContextBuilder(profile_registry=make_profile_registry(), mcp_manager=mcp) + profile = make_profile("test", mcp_servers={"gnexus-book": ["read"]}) + context = [Message(role="user", content="hi")] + + result = builder.build(context, profile, mem=None) + + assert any("gnexus-book instructions" in (m.content or "") for m in result) + assert mcp.calls == [{"gnexus-book"}] diff --git a/tests/unit/core/test_planning.py b/tests/unit/core/test_planning.py index 99f0f25..6ab3044 100644 --- a/tests/unit/core/test_planning.py +++ b/tests/unit/core/test_planning.py @@ -2,7 +2,36 @@ import pytest -from navi.core.planning import _parse_plan_steps +from navi.core.planning import PlanningEngine, _parse_plan_steps +from navi.llm.base import LLMResponse, Message +from tests.conftest_factory import make_profile + + +class RecordingLLM: + def __init__(self, responses): + self.responses = list(responses) + self.calls = [] + + async def complete(self, messages, **kwargs): + self.calls.append(messages) + return LLMResponse( + content=self.responses.pop(0), + tool_calls=None, + finish_reason="stop", + ) + + +class FakeContextBuilder: + def build_system_prompt(self, profile): + return "base system prompt" + + def _mcp_context_msg(self, profile=None): + if profile and profile.mcp_servers: + return Message( + role="system", + content="gnexus-book instructions with mcp_gnexus-book_search_docs", + ) + return None class TestParsePlanSteps: @@ -25,3 +54,76 @@ def test_no_steps_section(self): text = "Some random text without steps" assert _parse_plan_steps(text) == [] + + +class TestPlanningPrompt: + async def test_planning_prompt_includes_profile_mcp_and_persistence_rules(self): + profile = make_profile( + "server_admin", + planning_phase2_enabled=False, + mcp_servers={"gnexus-book": ["read", "write"]}, + ) + llm = RecordingLLM([ + "TASK: document infra\n" + "GOAL: docs updated\n" + "UNKNOWNS: NONE\n" + "RESOURCES:\n" + "- mcp_gnexus-book_search_docs: search docs\n" + "- context sources: gnexus-book\n" + "KNOWLEDGE SOURCE ASSESSMENT:\n" + "- Domain: infrastructure\n" + "- Primary source: connected knowledge servers\n" + "- Fallback: docs\n" + "KNOWLEDGE CAPTURE:\n" + "- New information to save: stable infra facts\n" + "- Target: connected knowledge server\n" + "- Duplication check: search target\n" + "- Rationale: reusable\n" + "COMPLEXITY: medium\n" + "SUBTASKS:\n" + "1. Search docs\n" + "2. Persist facts\n" + "REFLECT: no\n" + "COMMITMENTS: checkpoint", + "## Plan\n\n" + "**Task:** document infra\n" + "**Goal:** docs updated\n\n" + "**Milestones:**\nA. Inspect\nB. Persist\nC. Report\n\n" + "**Steps:**\n" + "1. Search gnexus-book → TOOL: mcp_gnexus-book_search_docs\n" + "2. Knowledge persistence checkpoint → TOOL: mcp_gnexus-book_propose_doc_change\n" + "3. Final synthesis → SELF\n\n" + "**Parallel:** NONE\n" + "**Risks:** NONE", + ]) + engine = PlanningEngine(FakeContextBuilder()) + context = [Message(role="user", content="update infra docs")] + + events = [] + async for event in engine.run(context, profile, llm, mem=None, tool_schemas=[]): + events.append(event) + + phase1_prompt = llm.calls[0][0].content + phase3_prompt = llm.calls[1][0].content + assert "gnexus-book instructions" in phase1_prompt + assert "memory` is only for personal user facts and preferences" in phase1_prompt + assert "Never use memory for infrastructure inventory" in phase1_prompt + assert "knowledge persistence checkpoint" in phase3_prompt + assert "Do not plan unavailable MCP tool calls" in phase3_prompt + + async def test_planning_prompt_omits_mcp_when_profile_has_no_mcp_servers(self): + profile = make_profile( + "developer", + planning_phase2_enabled=False, + mcp_servers={}, + ) + llm = RecordingLLM(["DIRECT"]) + engine = PlanningEngine(FakeContextBuilder()) + context = [Message(role="user", content="hello")] + + async for _event in engine.run(context, profile, llm, mem=None, tool_schemas=[]): + pass + + phase1_prompt = llm.calls[0][0].content + assert "gnexus-book instructions" not in phase1_prompt + assert "Connected MCP knowledge servers are authoritative only when the active profile exposes their tools" in phase1_prompt diff --git a/tests/unit/test_mcp.py b/tests/unit/test_mcp.py index 1bbd37d..040fd2e 100644 --- a/tests/unit/test_mcp.py +++ b/tests/unit/test_mcp.py @@ -72,6 +72,28 @@ tools = await manager.get_all_tools() assert tools == [] + def test_get_instructions_returns_selected_config_overlay_when_disconnected(self, tmp_path): + path = tmp_path / "mcp_servers.json" + path.write_text( + '{"gnexus-book": {"transport": "sse", "url": "http://example/sse", "instructions": "Use book."}}' + ) + manager = McpManager(config_path=path) + + instructions = manager.get_instructions({"gnexus-book"}) + + assert instructions == {"gnexus-book": "Use book."} + + def test_get_instructions_filters_to_selected_servers(self, tmp_path): + path = tmp_path / "mcp_servers.json" + path.write_text( + '{"gnexus-book": {"instructions": "Use book."}, "other": {"instructions": "Use other."}}' + ) + manager = McpManager(config_path=path) + + instructions = manager.get_instructions({"gnexus-book"}) + + assert instructions == {"gnexus-book": "Use book."} + class TestMcpTool: def test_name_prefix(self):