# Headless Navi Nodes — Future Architecture Sketch

**Status:** Research / deferred. Not in active development.
**Date:** 2026-05-24

---

## Problem

Navi currently runs all tools (`terminal`, `filesystem`, `code_exec`, `ssh_exec`) on the same machine as the backend server. Users who want Navi to manage their local dev machine must either:

1. Run the entire Navi backend locally (heavy, requires PostgreSQL).
2. Use `ssh_exec` to loopback to `localhost` (clunky, requires local sshd).

The original idea — a "terminal client" that lets the browser execute commands on the user's local machine — was explored and rejected.

---

## Rejected Approach: Browser-Based Client-Side Execution

### Why it was considered
A browser-based client could theoretically receive a `terminal` command from the server, run it locally via a companion app or extension, and return stdout.

### Why it was rejected
1. **Sandbox impossibility.** Browsers cannot spawn local shells. A companion (Electron / Tauri / browser extension + native messaging host) is required, which is no longer a "web client".
2. **Agent loop blocking.** `Agent.run_stream` assumes tools are synchronous `await tool.execute()` calls inside a single Python process. A remote tool that waits for a browser response would freeze the entire agent loop or require a full async-state-machine refactor.
3. **C2 / trust model.** The server instructing the client to execute arbitrary commands is a command-and-control pattern. Authentication, authorization, and sandboxing of the client-side executor become critical and complex.
4. **Device ambiguity.** If a user has Navi open on desktop and mobile, which device executes `terminal("npm install")`? Requires device registry, affinity, and explicit routing.
5. **Maintenance burden.** Supporting three platforms (Linux, macOS, Windows) with installable companion software is unsustainable for a personal-assistant project.

**Conclusion:** Browser-based client-side execution is architecturally incompatible with Navi's current synchronous tool loop and operationally too expensive.

---

## Preferred Approach: Headless Navi Nodes (Swarm)

### Concept
A **headless Navi node** is a lightweight instance of the Navi backend (FastAPI + agent loop) without a web client. It runs on the target machine (e.g. a user's dev laptop, a VPS, a home server) and connects back to a **central Navi server**.

- The **central server** handles user-facing sessions, web client, and orchestration.
- **Headless nodes** handle local tool execution on their respective hosts.
- Both share a **common PostgreSQL database** for session persistence and scheduler state.
- Communication is via **outbound WebSocket** from node to central server (avoids NAT issues).

### High-level diagram

```
┌──────────────┐      WS/HTTP       ┌──────────────┐
│   Browser    │◄───────────────────►│   Central    │
│  (web client)│                     │ Navi Server  │
└──────────────┘                     └──────┬───────┘
                                            │
                                            │ WS outbound
                                            │ (nodes register here)
                                            │
        ┌───────────────────────────────────┼───────────────────────────────────┐
        │                                   │                                   │
        ▼                                   ▼                                   ▼
 ┌──────────────┐                ┌──────────────┐                ┌──────────────┐
 │  Headless    │                │  Headless    │                │  Headless    │
 │  Node A      │                │  Node B      │                │  Node C      │
 │  (dev laptop)│                │  (home NAS)  │                │  (VPS)       │
 └──────┬───────┘                └──────┬───────┘                └──────┬───────┘
        │                              │                              │
        ▼ local shell                  ▼ local shell                  ▼ local shell
```

### Advantages
- **No agent loop changes.** `terminal`, `filesystem`, `code_exec` remain synchronous Python tools inside the node's process.
- **No browser sandbox issues.** Tools run in a real OS process on the target machine.
- **NAT-friendly.** Nodes initiate outbound connections; no reverse tunnels or port forwarding needed.
- **Composable.** A user can attach as many machines as needed. Central server routes tasks to the appropriate node.

---

## Open Questions (To Be Solved Before Implementation)

### 1. Shared Database Partitioning
If all nodes share one PostgreSQL database:
- **Scheduler race.** Multiple nodes polling `recalls` will try to execute the same scheduled task. Needs `claimed_by` column or a leader-election mechanism per recall.
- **Session concurrent edits.** Two nodes appending to the same session's `messages` array could overwrite each other. Needs row-level locking or `instance_id` partitioning.
- **Memory extractor storm.** `process_stale_sessions` on every node would duplicate embedding work. Needs `instance_id` gating so only the central server or a designated node runs background workers.

**Direction:** Tag every row with `instance_id` (central = `main`). Nodes only read/write rows assigned to them. The scheduler table gets a `claimed_by` atomic UPDATE.

### 2. Tool Routing in the Agent
The agent on the central server must know that `terminal` for profile `server_admin` should run on Node B, not locally.

Options:
- **Profile-level node affinity.** Profile `server_admin` has `node_id: "home-nas"`. All tools in that profile execute on that node.
- **Remote tool proxy.** The central registry has proxy tools (`remote_terminal`, `remote_filesystem`) that forward calls to the node's REST/WebSocket API.
- **Subagent on node.** `spawn_agent` spawns the subagent on a remote node via the node's API instead of locally.

### 3. Headless Node Packaging
- **Docker:** Easy to ship, but `terminal` and `filesystem` operate inside the container by default. Access to the host requires `--privileged`, `--pid host`, or explicit volume mounts, which weakens isolation.
- **Systemd service / bare process:** Full host access natively, but harder to install and update across platforms.

**Direction:** Provide both: Docker for sandboxed/isolated tasks, bare-metal install script for full host management.

### 4. Authentication
Nodes must prove identity to the central server.
- Shared secret (`NODE_API_KEY`) in `.env`.
- mTLS (client certificates).
- JWT registration flow (node registers once, receives token).

### 5. Communication Protocol
- **WebSocket outbound** (`ws://central.navi/ws/nodes/{node_id}`) for real-time task streaming.
- **REST fallback** for nodes behind restrictive proxies.
- Reuse the existing event schema (`stream_start`, `tool_started`, `stream_delta`, `stream_end`) so the central server can forward node events to the browser client unchanged.

### 6. Lifecycle
- Node startup: register capabilities (available tools, OS, profiles) with central server.
- Heartbeat: ping every N seconds; central server marks node offline if missed.
- Graceful shutdown: close WS, release claimed recalls.

---

## Decision Log

| Date | Decision | Rationale |
|------|----------|-----------|
| 2026-05-24 | Reject browser client-side terminal | Sandbox impossibility, C2 trust issues, agent loop blocking |
| 2026-05-24 | Prefer headless node swarm | Preserves existing tool execution model, NAT-friendly, composable |

---

## Next Steps (When Prioritised)

1. Design `instance_id` database partitioning for sessions, recalls, and content.
2. Add `/ws/nodes/{node_id}` endpoint to central server for node registration and task streaming.
3. Define node-to-central auth mechanism (API key or mTLS).
4. Build minimal headless node package (Dockerfile + `.env` template).
5. Implement remote tool routing in `ToolRegistry` or as proxy tools.
6. Add node heartbeat and offline detection to `AgentSessionOrchestrator`.