Memory System

Long-term user memory: facts extracted from conversations and tool executions, stored in PostgreSQL, injected into every session.

PostgreSQL + pgvector + pg_trgm (semantic + text search)

The memory system requires PostgreSQL with two extensions:

Extension	Purpose	Auto-created by app?
`vector` (pgvector)	Semantic search via cosine distance on `embedding vector(768)`	Yes (`CREATE EXTENSION IF NOT EXISTS`)
`pg_trgm`	Fast ILIKE fallback via GIN trigram indexes on `category`, `key`, `value`	No — must be installed by DBA

pgvector is created automatically because the app typically runs with sufficient privileges on its own database. pg_trgm is a core PostgreSQL extension that may require superuser privileges to install. If it is already installed, GIN trigram indexes are created automatically; if not, the app falls back to plain ILIKE without indexes (functional but slower on large tables).

To install pg_trgm manually:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

Feature	SQLite	PostgreSQL
Storage	File-based	Server
Semantic search	No	Yes (cosine distance on `vector(768)`)
Embeddings	None	Generated via Ollama (`nomic-embed-text:latest`)
Metadata	`category, key, value`	`+ embedding, source, confidence, expires_at, source_context`

Dedicated embedding backend

The chat LLM backend (Ollama Cloud, OpenAI, etc.) and the embedding backend are separate.

.env
EMBEDDING_OLLAMA_HOST=http://192.168.1.168:11434  # local CPU server
EMBEDDING_OLLAMA_API_KEY=

If EMBEDDING_OLLAMA_HOST is empty, memory falls back to the main OLLAMA_HOST.

Schema migration

When upgrading the memory system to a new schema (e.g. adding pgvector columns), run:

.venv/bin/python navi/memory/migrate_pgvector.py

This script:

Verifies the vector extension is installed in PostgreSQL
Adds missing columns: embedding, source, confidence, expires_at, last_verified_at, source_context
Creates indexes: hnsw(embedding), expires, source+category, pg_trgm GIN indexes for ILIKE fallback

Safe to run multiple times — all operations use IF NOT EXISTS.

Backfill embeddings

After enabling pgvector, existing facts have embedding IS NULL. Generate embeddings for them:

.venv/bin/python navi/memory/backfill_embeddings.py

Batches of 8 facts at a time
2-second sleep between batches (rate-limit safety)
Safe to run multiple times — only touches rows without embeddings

Storage (`navi/memory/store.py`)

Three tables in the database:

Table	Purpose
`memory_facts`	Individual facts: `(user_id, category, key, value)` — unique on `(user_id, category, key)`
`memory_summary`	Per-user narrative summary (`user_id` scoped)
`session_memory_state`	Tracks which sessions have been processed (by `extracted_at`)

user_id references navi_users(id) with ON DELETE CASCADE. Facts and summaries are scoped per user. Admin with navi.memory.read_all can pass user_id=None for global search.

MemoryStore lazily creates tables on first async operation via _get_pool(). All operations are async via asyncpg (PostgreSQL).

Key operations

Method	Description
`upsert_fact(..., user_id=None)`	Insert or update a fact scoped to user
`search_facts(query, user_id=None, limit=15)`	Vector search first (cosine distance, cutoff 0.3), then ILIKE fallback. `user_id=None` requires admin permission for global search.
`delete_fact(key, category=None, user_id=None)`	Delete by key, optionally filtered by category and user
`get_all_facts(user_id=None, all_users=False, limit=None, offset=0, search=None, sort_by="category", sort_order="desc")`	All facts ordered by `sort_by`. Pass `all_users=True` for admin global view.
`get_summary(user_id=None)`	Current narrative summary text for user
`set_summary(content, user_id=None)`	Replace the summary for user
`mark_session_extracted(session_id)`	Record extraction timestamp
`get_extracted_at(session_id)`	Check if/when a session was processed
`backfill_embeddings(batch_size=8)`	Generate embeddings for facts with `embedding IS NULL`

Automatic extraction (`navi/memory/extractor.py`)

Facts are extracted from stale sessions automatically.

Trigger: POST /sessions (create new session) fires _process_stale_sessions() as a background task.

Stale criterion: session.last_active < now - 30 minutes AND not yet extracted (or extracted before last activity).

Extraction process:

Render conversation as plain text, including:
- User messages
- Assistant messages
- [Tool call] tool_name(args) lines
- [Tool result] tool_name: output lines (truncated to 500 chars)
Truncate overall transcript to 12 000 chars (keep head + tail, drop middle).
Call LLM with extraction prompt: "extract stable facts about the user."
Parse JSON array with fields: category, key, value, source, source_context.
Map confidence: tool_call/auto_discovery → 95, user_explicit → 90, default → 70.
Upsert each fact into memory_facts.
Regenerate memory_summary from all current facts.
Mark session as extracted.

Memory injection into agent context

At the start of each run_stream() / run() / run_ephemeral() call, _memory_msg() is called:

async def _memory_msg(self) -> Message | None:
    summary = await self._memory.get_summary()
    if not summary:
        return None
    return Message(role="system", content=f"## What I remember about the user\n\n{summary}")

This message is inserted after the main system message but before conversation history. The agent reads it on every turn.

Memory tool (`navi/tools/memory.py`)

Single unified memory tool with action parameter:

Action	Description
`save`	Upsert a fact with `category`, `key`, `value`, `source`, `confidence`, `expires_days`, `source_context`
`search`	Find facts by keyword query (vector search → ILIKE fallback)
`forget`	Delete a fact by `key` (optionally filtered by `category`)
`list`	Return all stored facts

Sources: conversation, tool_call, auto_discovery, user_explicit

Confidence: 0-100. Tool output = 95, user statement = 80, web = 50, guess = 30.

Memory usage guidelines (from persona)

Call memory_search when:

The user mentions something personal (location, project, preference, recurring task).
About to make an assumption about the user's environment or preferences — verify first.
The user asks about something helped with before.

Do NOT call memory_search reflexively at the start of every session — only when context warrants it.

Call memory_forget only when the user explicitly asks, or when a stored fact is clearly wrong or outdated.