Newer
Older
navi-1 / docs / memory.md

Memory System

Long-term user memory: facts extracted from conversations and tool executions, stored in PostgreSQL (or SQLite fallback), injected into every session.

PostgreSQL + pgvector (semantic search)

When DATABASE_URL is set, the memory system uses PostgreSQL with pgvector for semantic search via embeddings.

Feature SQLite PostgreSQL
Storage File-based Server
Semantic search No Yes (cosine distance on vector(768))
Embeddings None Generated via Ollama (nomic-embed-text:latest)
Metadata category, key, value + embedding, source, confidence, expires_at, source_context

Dedicated embedding backend

The chat LLM backend (Ollama Cloud, OpenAI, etc.) and the embedding backend are separate.

.env
EMBEDDING_OLLAMA_HOST=http://192.168.1.168:11434  # local CPU server
EMBEDDING_OLLAMA_API_KEY=

If EMBEDDING_OLLAMA_HOST is empty, memory falls back to the main OLLAMA_HOST.


Schema migration

When upgrading the memory system to a new schema (e.g. adding pgvector columns), run:

.venv/bin/python navi/memory/migrate_pgvector.py

This script:

  1. Verifies the vector extension is installed in PostgreSQL
  2. Adds missing columns: embedding, source, confidence, expires_at, last_verified_at, source_context
  3. Creates indexes: hnsw(embedding), expires, source+category

Safe to run multiple times — all operations use IF NOT EXISTS.


Backfill embeddings

After enabling pgvector, existing facts have embedding IS NULL. Generate embeddings for them:

.venv/bin/python navi/memory/backfill_embeddings.py
  • Batches of 8 facts at a time
  • 2-second sleep between batches (rate-limit safety)
  • Safe to run multiple times — only touches rows without embeddings

Storage (navi/memory/store.py)

Three tables in the database:

Table Purpose
memory_facts Individual facts: (category, key, value) — unique on (category, key)
memory_summary Single-row narrative summary generated from all facts
session_memory_state Tracks which sessions have been processed (by extracted_at)

MemoryStore is initialized synchronously (creates tables), all operations are async via asyncpg (PostgreSQL).

Key operations

Method Description
upsert_fact(...) Insert or update a fact (generates embedding if pgvector + backend available)
search_facts(query, limit=15) Vector search first (cosine distance, cutoff 0.3), then ILIKE fallback
delete_fact(key, category=None) Delete by key, optionally filtered by category
get_all_facts(limit=None) All facts ordered by (category, updated_at DESC)
get_summary() Current narrative summary text
set_summary(content) Replace the summary
mark_session_extracted(session_id) Record extraction timestamp
get_extracted_at(session_id) Check if/when a session was processed
backfill_embeddings(batch_size=8) Generate embeddings for facts with embedding IS NULL

Automatic extraction (navi/memory/extractor.py)

Facts are extracted from stale sessions automatically.

Trigger: POST /sessions (create new session) fires _process_stale_sessions() as a background task.

Stale criterion: session.last_active < now - 30 minutes AND not yet extracted (or extracted before last activity).

Extraction process:

  1. Render conversation as plain text, including:
    • User messages
    • Assistant messages
    • [Tool call] tool_name(args) lines
    • [Tool result] tool_name: output lines (truncated to 500 chars)
  2. Truncate overall transcript to 12 000 chars (keep head + tail, drop middle).
  3. Call LLM with extraction prompt: "extract stable facts about the user."
  4. Parse JSON array with fields: category, key, value, source, source_context.
  5. Map confidence: tool_call/auto_discovery → 95, user_explicit → 90, default → 70.
  6. Upsert each fact into memory_facts.
  7. Regenerate memory_summary from all current facts.
  8. Mark session as extracted.

Memory injection into agent context

At the start of each run_stream() / run() / run_ephemeral() call, _memory_msg() is called:

async def _memory_msg(self) -> Message | None:
    summary = await self._memory.get_summary()
    if not summary:
        return None
    return Message(role="system", content=f"## What I remember about the user\n\n{summary}")

This message is inserted after the main system message but before conversation history. The agent reads it on every turn.


Memory tool (navi/tools/memory.py)

Single unified memory tool with action parameter:

Action Description
save Upsert a fact with category, key, value, source, confidence, expires_days, source_context
search Find facts by keyword query (vector search → ILIKE fallback)
forget Delete a fact by key (optionally filtered by category)
list Return all stored facts

Sources: conversation, tool_call, auto_discovery, user_explicit

Confidence: 0-100. Tool output = 95, user statement = 80, web = 50, guess = 30.


Memory usage guidelines (from persona)

Call memory_search when:

  • The user mentions something personal (location, project, preference, recurring task).
  • About to make an assumption about the user's environment or preferences — verify first.
  • The user asks about something helped with before.

Do NOT call memory_search reflexively at the start of every session — only when context warrants it.

Call memory_forget only when the user explicitly asks, or when a stored fact is clearly wrong or outdated.