Newer
Older
navi-1 / docs / sessions.md

Sessions

Session management, dual-buffer design, and context compression.

Session model (navi/core/session.py)

class Session(BaseModel):
    id: str                          # UUID
    profile_id: str                  # active profile
    messages: list[Message]          # full display history — never compressed
    context: list[Message]           # LLM context — may be replaced with summary
    context_token_count: int         # accumulated tokens; reset to 0 after compression
    pinned: bool                     # pinned sessions appear first in sidebar
    name: str | None                 # auto-generated display name (set after first exchange)
    created_at: datetime
    last_active: datetime
    planning_logs: list[dict]        # raw planning phase outputs per turn (debug)

Message flags

Messages in session.messages carry optional flags beyond role/content:

Flag Purpose
is_plan: bool Message is a planning phase output (shown as plan card in UI, not text)
is_compression: bool Marker message injected when context compression ran
is_summary: bool A summary message replacing compressed history in session.context
`thinking: str \ None` LLM reasoning captured during a tool-calling turn

Dual-buffer design

Two separate message lists serve different purposes:

Buffer Purpose Modified by compression?
session.messages Full display history shown in the UI Never
session.context What the LLM sees on each call Yes — old turns replaced with a summary

Tool results, image injections, and assistant messages are appended to both buffers. When compression runs, only session.context is modified.

Note: System messages are not stored in either buffer. They are injected fresh from the current profile on every LLM call via _build_context(). This makes profile switches take effect immediately.

Session store

InMemorySessionStore

Simple dict-backed store for testing.

PgSessionStore (navi/core/pg_session_store.py)

Production store backed by PostgreSQL via asyncpg.

  • create(profile_id) → new Session
  • get(session_id)Session | None
  • save(session) — serializes with model_dump(mode='json') (required for datetime serialization)
  • list_all() → sorted by (pinned DESC, last_active DESC)
  • delete(session_id)bool
  • set_pinned(session_id, pinned)bool
  • set_name(session_id, name)bool

Requires DATABASE_URL env variable (e.g. postgresql://user:pass@localhost/navi).


Context compression (navi/core/compressor.py)

Keeps the LLM context within the token budget by summarizing old conversation turns.

When it triggers

Two trigger points:

  1. Pre-turn (in run_stream()): before calling the LLM, checks session.context_token_count against the threshold. Compresses if tokens >= num_ctx * threshold.
  2. Post-turn (via CompressionWorker): after StreamEnd, the worker re-checks and compresses if needed.

Config values (settings):

  • context_compression_enabled: bool = True
  • context_compression_threshold: float = 0.80 — trigger at 80% of ollama_num_ctx
  • context_keep_recent: int = 10 — keep last N conversational turns verbatim
  • context_summary_temperature: float = 0.3

Compression algorithm

compress_context(context, llm, model, temperature, keep_recent):

  1. Partition messages into to_summarize (old turns) and to_keep (recent keep_recent turns).
    • A "turn" = one user message + all following assistant/tool messages up to the next user message.
    • Tool call groups (assistant + results) are never split across the partition.
    • Existing summary messages are folded into the next pass.
  2. Format to_summarize as plain text (tool calls shown as compact previews, max 120 chars for args, max 300 chars for results).
  3. Truncate formatted input to _MAX_SUMMARY_INPUT_CHARS = 12_000 chars.
  4. Call llm.complete() with think=False to produce a bullet-point summary.
  5. Replace to_summarize with a single summary message (role=user, is_summary=True).
  6. Return system_msgs + [summary_msg] + to_keep.

If compression fails, the exception propagates to CompressionWorker, which logs a warning and continues — compression failure is non-fatal.

What is never compressed

  • session.messages — full history is always intact.
  • The last context_keep_recent conversational turns.
  • System messages (never stored in context anyway).

Session file uploads

Files uploaded via POST /sessions/{id}/files are stored in session_files/{session_id}/.

  • Max size: session_files_max_size_mb (default: 200 MB)
  • TTL: session_files_ttl_hours (default: 24 hours)
  • A background cleanup_loop (started on FastAPI startup) deletes stale session directories.
  • Executable files (.sh, .py, .exe, etc.) are rejected.
  • Duplicate filenames get a numeric suffix.

When files are uploaded via the UI, their paths are appended to the user message content:

[Uploaded files on disk:
- filename.pdf → session_files/{id}/filename.pdf]

This lets the agent use filesystem or code_exec to access the files.


Debug endpoints

  • GET /sessions/{id}/context — returns what the LLM actually sees (may differ from messages after compression).
  • GET /sessions/{id}/planning — returns session.planning_logs: raw planning phase outputs per turn.