diff --git a/docs/future_headless_nodes.md b/docs/future_headless_nodes.md new file mode 100644 index 0000000..63fa782 --- /dev/null +++ b/docs/future_headless_nodes.md @@ -0,0 +1,134 @@ +# Headless Navi Nodes — Future Architecture Sketch + +**Status:** Research / deferred. Not in active development. +**Date:** 2026-05-24 + +--- + +## Problem + +Navi currently runs all tools (`terminal`, `filesystem`, `code_exec`, `ssh_exec`) on the same machine as the backend server. Users who want Navi to manage their local dev machine must either: + +1. Run the entire Navi backend locally (heavy, requires PostgreSQL). +2. Use `ssh_exec` to loopback to `localhost` (clunky, requires local sshd). + +The original idea — a "terminal client" that lets the browser execute commands on the user's local machine — was explored and rejected. + +--- + +## Rejected Approach: Browser-Based Client-Side Execution + +### Why it was considered +A browser-based client could theoretically receive a `terminal` command from the server, run it locally via a companion app or extension, and return stdout. + +### Why it was rejected +1. **Sandbox impossibility.** Browsers cannot spawn local shells. A companion (Electron / Tauri / browser extension + native messaging host) is required, which is no longer a "web client". +2. **Agent loop blocking.** `Agent.run_stream` assumes tools are synchronous `await tool.execute()` calls inside a single Python process. A remote tool that waits for a browser response would freeze the entire agent loop or require a full async-state-machine refactor. +3. **C2 / trust model.** The server instructing the client to execute arbitrary commands is a command-and-control pattern. Authentication, authorization, and sandboxing of the client-side executor become critical and complex. +4. **Device ambiguity.** If a user has Navi open on desktop and mobile, which device executes `terminal("npm install")`? Requires device registry, affinity, and explicit routing. +5. **Maintenance burden.** Supporting three platforms (Linux, macOS, Windows) with installable companion software is unsustainable for a personal-assistant project. + +**Conclusion:** Browser-based client-side execution is architecturally incompatible with Navi's current synchronous tool loop and operationally too expensive. + +--- + +## Preferred Approach: Headless Navi Nodes (Swarm) + +### Concept +A **headless Navi node** is a lightweight instance of the Navi backend (FastAPI + agent loop) without a web client. It runs on the target machine (e.g. a user's dev laptop, a VPS, a home server) and connects back to a **central Navi server**. + +- The **central server** handles user-facing sessions, web client, and orchestration. +- **Headless nodes** handle local tool execution on their respective hosts. +- Both share a **common PostgreSQL database** for session persistence and scheduler state. +- Communication is via **outbound WebSocket** from node to central server (avoids NAT issues). + +### High-level diagram + +``` +┌──────────────┐ WS/HTTP ┌──────────────┐ +│ Browser │◄───────────────────►│ Central │ +│ (web client)│ │ Navi Server │ +└──────────────┘ └──────┬───────┘ + │ + │ WS outbound + │ (nodes register here) + │ + ┌───────────────────────────────────┼───────────────────────────────────┐ + │ │ │ + ▼ ▼ ▼ + ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ + │ Headless │ │ Headless │ │ Headless │ + │ Node A │ │ Node B │ │ Node C │ + │ (dev laptop)│ │ (home NAS) │ │ (VPS) │ + └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ + │ │ │ + ▼ local shell ▼ local shell ▼ local shell +``` + +### Advantages +- **No agent loop changes.** `terminal`, `filesystem`, `code_exec` remain synchronous Python tools inside the node's process. +- **No browser sandbox issues.** Tools run in a real OS process on the target machine. +- **NAT-friendly.** Nodes initiate outbound connections; no reverse tunnels or port forwarding needed. +- **Composable.** A user can attach as many machines as needed. Central server routes tasks to the appropriate node. + +--- + +## Open Questions (To Be Solved Before Implementation) + +### 1. Shared Database Partitioning +If all nodes share one PostgreSQL database: +- **Scheduler race.** Multiple nodes polling `recalls` will try to execute the same scheduled task. Needs `claimed_by` column or a leader-election mechanism per recall. +- **Session concurrent edits.** Two nodes appending to the same session's `messages` array could overwrite each other. Needs row-level locking or `instance_id` partitioning. +- **Memory extractor storm.** `process_stale_sessions` on every node would duplicate embedding work. Needs `instance_id` gating so only the central server or a designated node runs background workers. + +**Direction:** Tag every row with `instance_id` (central = `main`). Nodes only read/write rows assigned to them. The scheduler table gets a `claimed_by` atomic UPDATE. + +### 2. Tool Routing in the Agent +The agent on the central server must know that `terminal` for profile `server_admin` should run on Node B, not locally. + +Options: +- **Profile-level node affinity.** Profile `server_admin` has `node_id: "home-nas"`. All tools in that profile execute on that node. +- **Remote tool proxy.** The central registry has proxy tools (`remote_terminal`, `remote_filesystem`) that forward calls to the node's REST/WebSocket API. +- **Subagent on node.** `spawn_agent` spawns the subagent on a remote node via the node's API instead of locally. + +### 3. Headless Node Packaging +- **Docker:** Easy to ship, but `terminal` and `filesystem` operate inside the container by default. Access to the host requires `--privileged`, `--pid host`, or explicit volume mounts, which weakens isolation. +- **Systemd service / bare process:** Full host access natively, but harder to install and update across platforms. + +**Direction:** Provide both: Docker for sandboxed/isolated tasks, bare-metal install script for full host management. + +### 4. Authentication +Nodes must prove identity to the central server. +- Shared secret (`NODE_API_KEY`) in `.env`. +- mTLS (client certificates). +- JWT registration flow (node registers once, receives token). + +### 5. Communication Protocol +- **WebSocket outbound** (`ws://central.navi/ws/nodes/{node_id}`) for real-time task streaming. +- **REST fallback** for nodes behind restrictive proxies. +- Reuse the existing event schema (`stream_start`, `tool_started`, `stream_delta`, `stream_end`) so the central server can forward node events to the browser client unchanged. + +### 6. Lifecycle +- Node startup: register capabilities (available tools, OS, profiles) with central server. +- Heartbeat: ping every N seconds; central server marks node offline if missed. +- Graceful shutdown: close WS, release claimed recalls. + +--- + +## Decision Log + +| Date | Decision | Rationale | +|------|----------|-----------| +| 2026-05-24 | Reject browser client-side terminal | Sandbox impossibility, C2 trust issues, agent loop blocking | +| 2026-05-24 | Prefer headless node swarm | Preserves existing tool execution model, NAT-friendly, composable | + +--- + +## Next Steps (When Prioritised) + +1. Design `instance_id` database partitioning for sessions, recalls, and content. +2. Add `/ws/nodes/{node_id}` endpoint to central server for node registration and task streaming. +3. Define node-to-central auth mechanism (API key or mTLS). +4. Build minimal headless node package (Dockerfile + `.env` template). +5. Implement remote tool routing in `ToolRegistry` or as proxy tools. +6. Add node heartbeat and offline detection to `AgentSessionOrchestrator`. diff --git a/mcp_servers.d/gnexus-creds.json b/mcp_servers.d/gnexus-creds.json new file mode 100644 index 0000000..20ef38b --- /dev/null +++ b/mcp_servers.d/gnexus-creds.json @@ -0,0 +1,21 @@ +{ + "transport": "streamable_http", + "url": "https://creds.gnexus.space/mcp-protocol/", + "headers": { + "Authorization": "Bearer gcr_68df12db3e7639da_cm2qvpXfRcut11NnB0VSBxjzXXaqyza5aN_42iSP3tk" + }, + "groups": { + "read": [ + "search_secrets", + "get_secret", + "reveal_secret" + ], + "write": [ + "create_secret", + "update_secret", + "set_secret_status", + "archive_secret" + ] + }, + "instructions": "MCP tools for gnexus-creds — personal secret storage.\n\nQuery mapping (use in this order):\n1. Find a secret → search_secrets\n2. View metadata and public/masked fields → get_secret (pass secret_id from search result)\n3. Only when user explicitly asks for decrypted values → reveal_secret (creates audit event)\n4. Add a new secret → create_secret\n5. Edit fields/metadata → update_secret\n6. Change status → set_secret_status (allowed values: actual, outdated, archived)\n7. Archive (hide from MCP) → archive_secret\n\nCritical details:\n- secret_id is a UUID string (e.g. 550e8400-e29b-41d4-a716-446655440000), NOT a secret name. Obtain it from search_secrets results: items[].id.\n- get_secret returns metadata and public/masked fields but NEVER decrypts encrypted values. Use reveal_secret only when the user explicitly needs the plaintext value.\n- create_secret and update_secret require a 'fields' argument that is a LIST of objects: [{\"name\": \"...\", \"value\": \"...\", \"encrypted\": true, \"masked\": false, \"position\": 0}]. It is an array, not a single object.\n- When creating or updating, always set encrypted=true for passwords, tokens, PINs, private keys, and recovery codes. Only non-sensitive identifiers (e.g. service name, username) should remain unencrypted.\n- Only secrets with allow_mcp=true are visible through MCP. Archived secrets are unavailable.\n- search_secrets supports pagination with offset and limit. Maximum limit is 50. If total > 50, iterate with offset increments.\n- Never reveal, copy, display, modify, archive, or create secrets unless the user's request clearly requires it." +} diff --git a/navi/mcp/client.py b/navi/mcp/client.py index 21c1d0c..4914bfd 100644 --- a/navi/mcp/client.py +++ b/navi/mcp/client.py @@ -7,9 +7,11 @@ from typing import Any import anyio +import httpx from mcp import ClientSession from mcp.client.sse import sse_client from mcp.client.stdio import StdioServerParameters, stdio_client +from mcp.client.streamable_http import streamable_http_client from mcp.types import Tool from .config import McpServerConfig @@ -78,6 +80,18 @@ headers=self.config.headers, ) ) + elif self.config.is_streamable_http: + if not self.config.url: + raise ValueError("streamable_http transport requires 'url'") + http_client = await self._exit_stack.enter_async_context( + httpx.AsyncClient(headers=self.config.headers or {}) + ) + transport = await self._exit_stack.enter_async_context( + streamable_http_client(self.config.url, http_client=http_client) + ) + # streamable_http_client returns (read, write, get_session_id). + # We only need read and write for ClientSession. + transport = transport[:2] else: raise ValueError(f"unknown transport: {self.config.transport}") diff --git a/navi/mcp/config.py b/navi/mcp/config.py index a5abfa8..e3a8eef 100644 --- a/navi/mcp/config.py +++ b/navi/mcp/config.py @@ -13,7 +13,7 @@ class McpServerConfig(BaseModel): """Configuration for a single MCP server.""" - transport: Literal["stdio", "sse"] = "stdio" + transport: Literal["stdio", "sse", "streamable_http"] = "stdio" # stdio fields command: str | None = None @@ -41,6 +41,10 @@ def is_sse(self) -> bool: return self.transport == "sse" + @property + def is_streamable_http(self) -> bool: + return self.transport == "streamable_http" + def _default_dir() -> Path: """Return the default directory for per-server MCP configs."""