Newer
Older
navi-1 / docs / tools.md

Tool System

Tools are the agent's actions. All tools implement the Tool ABC from navi/tools/base.py.

Two tiers

Built-in tools (navi/tools/)

Registered in build_default_registries() as builtins. Never removed on hot-reload.

MCP tools (external servers)

MCP tools are not built into Navi directly. They are provided by MCP servers configured in mcp_servers.d/*.json and registered at startup by McpManager. Each MCP tool name follows the format mcp__<server_name>__<tool_name>.

Server Tool name Description
navi-web mcp__navi-web__web_search Web search (SearXNG primary, DDG fallback, Brave tertiary)
navi-web mcp__navi-web__web_view Open a URL in a headless browser and return clean readable text
navi-web mcp__navi-web__http_request Raw HTTP request (GET/POST/PUT/PATCH/DELETE)
navi-3d mcp__navi-3d__compile_scad Compile an OpenSCAD script into a binary STL file
navi-3d mcp__navi-3d__lint_scad Lightweight OpenSCAD source linting before STL compilation
navi-3d mcp__navi-3d__render_stl Render preview PNG images from an STL file (up to 3 views)
gnexus-creds mcp__gnexus-creds__search_secrets Search personal secrets (UUID id, masked values)
gnexus-creds mcp__gnexus-creds__get_secret Get secret metadata and masked fields
gnexus-creds mcp__gnexus-creds__reveal_secret Decrypt and return plaintext value (audited)
gnexus-creds mcp__gnexus-creds__create_secret Create a new secret with encrypted fields
gnexus-creds mcp__gnexus-creds__update_secret Update fields/metadata of an existing secret
gnexus-creds mcp__gnexus-creds__set_secret_status Change secret status (actual / outdated / archived)
gnexus-creds mcp__gnexus-creds__archive_secret Permanently hide secret from MCP queries

MCP tools survive reload_tools because they are registered as external tools in ToolRegistry.

Tool Name Description
FilesystemTool filesystem Read/write/list/copy/grep/diff local files (path restrictions via config)
CodeExecTool code_exec Execute Python in a subprocess sandbox
TerminalTool terminal Run shell commands (command allowlist via config)
SshExecTool ssh_exec SSH exec and SCP file transfer; connection pool keyed by session ID
ImageViewTool image_view Load image from path/URL → resize to 1024px, convert to JPEG, return base64 for multimodal LLM
TodoTool todo Per-session task checklist (set/update/read)
ScratchpadTool scratchpad Per-session named working notes (write/append/read/clear)
ReloadToolsTool reload_tools Hot-reload user tools and context providers without server restart
ListToolsTool list_tools Return the live tool list from registry
ToolManualTool tool_manual Return manuals/{name}.md or auto-generate from schema
MemoryTool memory Unified memory tool: save, search, and forget facts
SpawnAgentTool spawn_agent Spawn an isolated subagent (blocking). Optional profile_id selects another profile; omitted means parent profile. inherit_system_prompt=true prepends the parent profile's full system prompt as a base layer for the subagent
SwitchProfileTool switch_profile Switch the active profile for a session
ListProfilesTool list_profiles List all available profiles
ShareFileTool share_file Copy an existing local file into session files and return a download link
ContentPublishTool content_publish Register an existing session file for inline viewing in chat
McpStatusTool mcp_status Check connectivity and list tools for configured MCP servers
CreateMcpServerTool create_mcp_server Scaffold a new MCP server directory with boilerplate
TestMcpToolTool test_mcp_tool Execute a single MCP tool call in isolation for diagnostics
ReflectTool reflect Self-reflection and analysis
ScheduleRecallTool schedule_recall Schedule a headless callback for the current session (once/recurring/immediate)
ManageRecallTool manage_recall Cancel, skip, or list scheduled recalls for the current session

User tools (tools/*.py)

Written manually or via create_mcp_server. Auto-discovered at startup.

  • Files starting with _ are ignored.
  • tools/enabled.json — list of user tool names to include in all profiles automatically.
  • tools/_template.py — canonical format reference (not loaded).

Currently present: get_current_datetime.py, gmail.py, weather.py.


Tool formats

Module-level format (preferred for user tools)

name = "my_tool"
description = "What it does and when to use it — be specific."
parameters = {
    "type": "object",
    "properties": {
        "param": {"type": "string", "description": "..."}
    },
    "required": ["param"]
}

async def execute(params: dict) -> str:
    # Return a plain string on success.
    # Raise an exception to signal failure.
    return "result"

No classes, no module-level print(). The loader wraps execute in a Tool subclass automatically.

Class-based format (built-in tools)

from navi.tools.base import Tool, ToolResult

class MyTool(Tool):
    name = "my_tool"
    description = "..."
    parameters = {"type": "object", "properties": {...}, "required": [...]}

    async def execute(self, params: dict) -> ToolResult:
        return ToolResult(success=True, output="result")

ToolResult fields:

  • success: bool
  • output: str — always a string; LLM sees this
  • error: str | None — included in output on failure via to_message_content()
  • metadata: dict — internal hints (e.g. is_image: True → triggers image injection into context)

Tool loading (navi/tools/loader.py)

load_tools_from_dir(tools_dir) returns LoadResult(loaded, errors).

Load order:

  1. Try module-level format (checks for name, description, parameters, execute).
  2. Fall back to class-based (scans for Tool subclasses).

Errors are isolated per file — one broken file does not prevent others from loading. Errors are logged and returned in LoadResult.errors.


Middleware hooks

Tools support before_execute / after_execute middleware hooks registered via ToolRegistry.add_middleware(). The built-in LoggingMiddleware logs every tool call with duration and result summary.

Use middleware for cross-cutting concerns: metrics, rate limiting, authorization, audit logging.

Hot-reload

reload_tools tool calls ToolRegistry.reload_user_tools(tools_dir):

  1. Drops all tools that are NOT in _builtin_names.
  2. Re-runs load_tools_from_dir.
  3. New tools registered without server restart.

New tools become available from the next user message (tool schemas are built at run_stream() entry, not during execution).


Self-extension

Navi supports two ways to add new capabilities:

User tools (simple scripts)

For small, single-purpose helpers, use write_tool. It writes a Python file into tools/ and hot-reloads it in one call. The new tool is added to tools/enabled.json and becomes available in all profiles from the next user message.

Requirements for a user tool:

  • Module-level name, description, parameters.
  • async def execute(params: dict) -> str.

The agent should call tool_manual("write_tool") before using it.

MCP servers (complex integrations)

For richer integrations that need their own process, dependencies, or state, scaffold an MCP server using create_mcp_server. This creates:

  1. A directory under mcp-servers/{name}/.
  2. A server.py entrypoint with stdio transport.
  3. A config file at mcp_servers.d/{name}.json.
  4. Registration via reload_tools or server restart.

The agent should call tool_manual("create_mcp_server") before using it for the first time.


Scratchpad and Todo

Both are per-session, backed by the PostgreSQL KV-store (session_store table) and survive server restarts.

Scratchpad — named sections for working notes within a task. Operations: write, append, read, clear. Subagents get isolated scratchpads (unique UUID-based session ID in run_ephemeral()).

Todo — checklist for tracking multi-step plans. Operations: set (replace all tasks), update (set status of one task), read. Statuses: pending, in_progress, done, failed, skipped.


Image tool flow

When image_view succeeds, it returns ToolResult with metadata={"is_image": True, "base64": "..."}.

The agent detects this and appends a synthetic user message with the image to session.context (but not session.messages). This makes the image visible to the next LLM call without polluting the display history.

See sessions.md for the dual-buffer design.