Newer
Older
navi-1 / docs / sessions.md
@Eugene Sukhodolskiy Eugene Sukhodolskiy on 14 Apr 4 KB Add backend documentation

Sessions

Session management, dual-buffer design, and context compression.

Session model (navi/core/session.py)

class Session(BaseModel):
    id: str                          # UUID
    profile_id: str                  # active profile
    messages: list[Message]          # full display history — never compressed
    context: list[Message]           # LLM context — may be replaced with summary
    context_token_count: int         # accumulated tokens; reset to 0 after compression
    pinned: bool                     # pinned sessions appear first in sidebar
    created_at: datetime
    last_active: datetime

Dual-buffer design

Two separate message lists serve different purposes:

Buffer Purpose Modified by compression?
session.messages Full display history shown in the UI Never
session.context What the LLM sees on each call Yes — old turns replaced with a summary

Tool results, image injections, and assistant messages are appended to both buffers. When compression runs, only session.context is modified.

Note: System messages are not stored in either buffer. They are injected fresh from the current profile on every LLM call via _build_context(). This makes profile switches take effect immediately.

Session store

InMemorySessionStore

Simple dict-backed store for testing.

SQLiteSessionStore (navi/core/sqlite_session_store.py)

Production store backed by SQLite via aiosqlite.

  • create(profile_id) → new Session
  • get(session_id)Session | None
  • save(session) — serializes with model_dump(mode='json') (required for datetime serialization)
  • list_all() → sorted by (pinned DESC, last_active DESC)
  • delete(session_id)bool
  • set_pinned(session_id, pinned)bool

DB path: settings.db_path (default: navi.db).


Context compression (navi/core/compressor.py)

Keeps the LLM context within the token budget by summarizing old conversation turns.

When it triggers

Two trigger points:

  1. Pre-turn (in run_stream()): before calling the LLM, checks session.context_token_count against the threshold. Compresses if tokens >= num_ctx * threshold.
  2. Post-turn (via CompressionWorker): after StreamEnd, the worker re-checks and compresses if needed.

Config values (settings):

  • context_compression_enabled: bool = True
  • context_compression_threshold: float = 0.80 — trigger at 80% of ollama_num_ctx
  • context_keep_recent: int = 10 — keep last N conversational turns verbatim
  • context_summary_temperature: float = 0.3

Compression algorithm

compress_context(context, llm, model, temperature, keep_recent):

  1. Partition messages into to_summarize (old turns) and to_keep (recent keep_recent turns).
    • A "turn" = one user message + all following assistant/tool messages up to the next user message.
    • Tool call groups (assistant + results) are never split across the partition.
    • Existing summary messages are folded into the next pass.
  2. Format to_summarize as plain text (tool calls shown as compact previews, max 120 chars for args, max 300 chars for results).
  3. Truncate formatted input to _MAX_SUMMARY_INPUT_CHARS = 12_000 chars.
  4. Call llm.complete() with think=False to produce a bullet-point summary.
  5. Replace to_summarize with a single summary message (role=user, is_summary=True).
  6. Return system_msgs + [summary_msg] + to_keep.

If compression fails, the exception propagates to CompressionWorker, which logs a warning and continues — compression failure is non-fatal.

What is never compressed

  • session.messages — full history is always intact.
  • The last context_keep_recent conversational turns.
  • System messages (never stored in context anyway).

Session file uploads

Files uploaded via POST /sessions/{id}/files are stored in session_files/{session_id}/.

  • Max size: session_files_max_size_mb (default: 200 MB)
  • TTL: session_files_ttl_hours (default: 24 hours)
  • A background cleanup_loop (started on FastAPI startup) deletes stale session directories.
  • Executable files (.sh, .py, .exe, etc.) are rejected.
  • Duplicate filenames get a numeric suffix.

When files are uploaded via the UI, their paths are appended to the user message content:

[Uploaded files on disk:
- filename.pdf → session_files/{id}/filename.pdf]

This lets the agent use filesystem or code_exec to access the files.