Newer
Older
navi-1 / docs / future_headless_nodes.md

Headless Navi Nodes — Future Architecture Sketch

Status: Research / deferred. Not in active development. Date: 2026-05-24


Problem

Navi currently runs all tools (terminal, filesystem, code_exec, ssh_exec) on the same machine as the backend server. Users who want Navi to manage their local dev machine must either:

  1. Run the entire Navi backend locally (heavy, requires PostgreSQL).
  2. Use ssh_exec to loopback to localhost (clunky, requires local sshd).

The original idea — a "terminal client" that lets the browser execute commands on the user's local machine — was explored and rejected.


Rejected Approach: Browser-Based Client-Side Execution

Why it was considered

A browser-based client could theoretically receive a terminal command from the server, run it locally via a companion app or extension, and return stdout.

Why it was rejected

  1. Sandbox impossibility. Browsers cannot spawn local shells. A companion (Electron / Tauri / browser extension + native messaging host) is required, which is no longer a "web client".
  2. Agent loop blocking. Agent.run_stream assumes tools are synchronous await tool.execute() calls inside a single Python process. A remote tool that waits for a browser response would freeze the entire agent loop or require a full async-state-machine refactor.
  3. C2 / trust model. The server instructing the client to execute arbitrary commands is a command-and-control pattern. Authentication, authorization, and sandboxing of the client-side executor become critical and complex.
  4. Device ambiguity. If a user has Navi open on desktop and mobile, which device executes terminal("npm install")? Requires device registry, affinity, and explicit routing.
  5. Maintenance burden. Supporting three platforms (Linux, macOS, Windows) with installable companion software is unsustainable for a personal-assistant project.

Conclusion: Browser-based client-side execution is architecturally incompatible with Navi's current synchronous tool loop and operationally too expensive.


Preferred Approach: Headless Navi Nodes (Swarm)

Concept

A headless Navi node is a lightweight instance of the Navi backend (FastAPI + agent loop) without a web client. It runs on the target machine (e.g. a user's dev laptop, a VPS, a home server) and connects back to a central Navi server.

  • The central server handles user-facing sessions, web client, and orchestration.
  • Headless nodes handle local tool execution on their respective hosts.
  • Both share a common PostgreSQL database for session persistence and scheduler state.
  • Communication is via outbound WebSocket from node to central server (avoids NAT issues).

High-level diagram

┌──────────────┐      WS/HTTP       ┌──────────────┐
│   Browser    │◄───────────────────►│   Central    │
│  (web client)│                     │ Navi Server  │
└──────────────┘                     └──────┬───────┘
                                            │
                                            │ WS outbound
                                            │ (nodes register here)
                                            │
        ┌───────────────────────────────────┼───────────────────────────────────┐
        │                                   │                                   │
        ▼                                   ▼                                   ▼
 ┌──────────────┐                ┌──────────────┐                ┌──────────────┐
 │  Headless    │                │  Headless    │                │  Headless    │
 │  Node A      │                │  Node B      │                │  Node C      │
 │  (dev laptop)│                │  (home NAS)  │                │  (VPS)       │
 └──────┬───────┘                └──────┬───────┘                └──────┬───────┘
        │                              │                              │
        ▼ local shell                  ▼ local shell                  ▼ local shell

Advantages

  • No agent loop changes. terminal, filesystem, code_exec remain synchronous Python tools inside the node's process.
  • No browser sandbox issues. Tools run in a real OS process on the target machine.
  • NAT-friendly. Nodes initiate outbound connections; no reverse tunnels or port forwarding needed.
  • Composable. A user can attach as many machines as needed. Central server routes tasks to the appropriate node.

Open Questions (To Be Solved Before Implementation)

1. Shared Database Partitioning

If all nodes share one PostgreSQL database:

  • Scheduler race. Multiple nodes polling recalls will try to execute the same scheduled task. Needs claimed_by column or a leader-election mechanism per recall.
  • Session concurrent edits. Two nodes appending to the same session's messages array could overwrite each other. Needs row-level locking or instance_id partitioning.
  • Memory extractor storm. process_stale_sessions on every node would duplicate embedding work. Needs instance_id gating so only the central server or a designated node runs background workers.

Direction: Tag every row with instance_id (central = main). Nodes only read/write rows assigned to them. The scheduler table gets a claimed_by atomic UPDATE.

2. Tool Routing in the Agent

The agent on the central server must know that terminal for profile server_admin should run on Node B, not locally.

Options:

  • Profile-level node affinity. Profile server_admin has node_id: "home-nas". All tools in that profile execute on that node.
  • Remote tool proxy. The central registry has proxy tools (remote_terminal, remote_filesystem) that forward calls to the node's REST/WebSocket API.
  • Subagent on node. spawn_agent spawns the subagent on a remote node via the node's API instead of locally.

3. Headless Node Packaging

  • Docker: Easy to ship, but terminal and filesystem operate inside the container by default. Access to the host requires --privileged, --pid host, or explicit volume mounts, which weakens isolation.
  • Systemd service / bare process: Full host access natively, but harder to install and update across platforms.

Direction: Provide both: Docker for sandboxed/isolated tasks, bare-metal install script for full host management.

4. Authentication

Nodes must prove identity to the central server.

  • Shared secret (NODE_API_KEY) in .env.
  • mTLS (client certificates).
  • JWT registration flow (node registers once, receives token).

5. Communication Protocol

  • WebSocket outbound (ws://central.navi/ws/nodes/{node_id}) for real-time task streaming.
  • REST fallback for nodes behind restrictive proxies.
  • Reuse the existing event schema (stream_start, tool_started, stream_delta, stream_end) so the central server can forward node events to the browser client unchanged.

6. Lifecycle

  • Node startup: register capabilities (available tools, OS, profiles) with central server.
  • Heartbeat: ping every N seconds; central server marks node offline if missed.
  • Graceful shutdown: close WS, release claimed recalls.

Decision Log

Date Decision Rationale
2026-05-24 Reject browser client-side terminal Sandbox impossibility, C2 trust issues, agent loop blocking
2026-05-24 Prefer headless node swarm Preserves existing tool execution model, NAT-friendly, composable

Next Steps (When Prioritised)

  1. Design instance_id database partitioning for sessions, recalls, and content.
  2. Add /ws/nodes/{node_id} endpoint to central server for node registration and task streaming.
  3. Define node-to-central auth mechanism (API key or mTLS).
  4. Build minimal headless node package (Dockerfile + .env template).
  5. Implement remote tool routing in ToolRegistry or as proxy tools.
  6. Add node heartbeat and offline detection to AgentSessionOrchestrator.