Memory System

Long-term user memory: facts extracted from conversations and tool executions, stored in PostgreSQL (or SQLite fallback), injected into every session.

PostgreSQL + pgvector (semantic search)

When DATABASE_URL is set, the memory system uses PostgreSQL with pgvector for semantic search via embeddings.

Feature	SQLite	PostgreSQL
Storage	File-based	Server
Semantic search	No	Yes (cosine distance on `vector(768)`)
Embeddings	None	Generated via Ollama (`nomic-embed-text:latest`)
Metadata	`category, key, value`	`+ embedding, source, confidence, expires_at, source_context`

Dedicated embedding backend

The chat LLM backend (Ollama Cloud, OpenAI, etc.) and the embedding backend are separate.

.env
EMBEDDING_OLLAMA_HOST=http://192.168.1.168:11434  # local CPU server
EMBEDDING_OLLAMA_API_KEY=

If EMBEDDING_OLLAMA_HOST is empty, memory falls back to the main OLLAMA_HOST.

Schema migration

When upgrading the memory system to a new schema (e.g. adding pgvector columns), run:

.venv/bin/python navi/memory/migrate_pgvector.py

This script:

Verifies the vector extension is installed in PostgreSQL
Adds missing columns: embedding, source, confidence, expires_at, last_verified_at, source_context
Creates indexes: hnsw(embedding), expires, source+category

Safe to run multiple times — all operations use IF NOT EXISTS.

Backfill embeddings

After enabling pgvector, existing facts have embedding IS NULL. Generate embeddings for them:

.venv/bin/python navi/memory/backfill_embeddings.py

Batches of 8 facts at a time
2-second sleep between batches (rate-limit safety)
Safe to run multiple times — only touches rows without embeddings

Storage (`navi/memory/store.py`)

Three tables in the database:

Table	Purpose
`memory_facts`	Individual facts: `(category, key, value)` — unique on `(category, key)`
`memory_summary`	Single-row narrative summary generated from all facts
`session_memory_state`	Tracks which sessions have been processed (by `extracted_at`)

MemoryStore is initialized synchronously (creates tables), all operations are async via asyncpg (PostgreSQL).

Key operations

Method	Description
`upsert_fact(...)`	Insert or update a fact (generates embedding if pgvector + backend available)
`search_facts(query, limit=15)`	Vector search first (cosine distance, cutoff 0.3), then ILIKE fallback
`delete_fact(key, category=None)`	Delete by key, optionally filtered by category
`get_all_facts(limit=None)`	All facts ordered by `(category, updated_at DESC)`
`get_summary()`	Current narrative summary text
`set_summary(content)`	Replace the summary
`mark_session_extracted(session_id)`	Record extraction timestamp
`get_extracted_at(session_id)`	Check if/when a session was processed
`backfill_embeddings(batch_size=8)`	Generate embeddings for facts with `embedding IS NULL`

Automatic extraction (`navi/memory/extractor.py`)

Facts are extracted from stale sessions automatically.

Trigger: POST /sessions (create new session) fires _process_stale_sessions() as a background task.

Stale criterion: session.last_active < now - 30 minutes AND not yet extracted (or extracted before last activity).

Extraction process:

Render conversation as plain text, including:
- User messages
- Assistant messages
- [Tool call] tool_name(args) lines
- [Tool result] tool_name: output lines (truncated to 500 chars)
Truncate overall transcript to 12 000 chars (keep head + tail, drop middle).
Call LLM with extraction prompt: "extract stable facts about the user."
Parse JSON array with fields: category, key, value, source, source_context.
Map confidence: tool_call/auto_discovery → 95, user_explicit → 90, default → 70.
Upsert each fact into memory_facts.
Regenerate memory_summary from all current facts.
Mark session as extracted.

Memory injection into agent context

At the start of each run_stream() / run() / run_ephemeral() call, _memory_msg() is called:

async def _memory_msg(self) -> Message | None:
    summary = await self._memory.get_summary()
    if not summary:
        return None
    return Message(role="system", content=f"## What I remember about the user\n\n{summary}")

This message is inserted after the main system message but before conversation history. The agent reads it on every turn.

Memory tool (`navi/tools/memory.py`)

Single unified memory tool with action parameter:

Action	Description
`save`	Upsert a fact with `category`, `key`, `value`, `source`, `confidence`, `expires_days`, `source_context`
`search`	Find facts by keyword query (vector search → ILIKE fallback)
`forget`	Delete a fact by `key` (optionally filtered by `category`)
`list`	Return all stored facts

Sources: conversation, tool_call, auto_discovery, user_explicit

Confidence: 0-100. Tool output = 95, user statement = 80, web = 50, guess = 30.

Memory usage guidelines (from persona)

Call memory_search when:

The user mentions something personal (location, project, preference, recurring task).
About to make an assumption about the user's environment or preferences — verify first.
The user asks about something helped with before.

Do NOT call memory_search reflexively at the start of every session — only when context warrants it.

Call memory_forget only when the user explicitly asks, or when a stored fact is clearly wrong or outdated.