Sessions

Session management, dual-buffer design, and context compression.

Session model (`navi/core/session.py`)

class Session(BaseModel):
    id: str                          # UUID
    profile_id: str                  # active profile
    messages: list[Message]          # full display history — never compressed
    context: list[Message]           # LLM context — may be replaced with summary
    context_token_count: int         # accumulated tokens; reset to 0 after compression
    pinned: bool                     # pinned sessions appear first in sidebar
    name: str | None                 # auto-generated display name (set after first exchange)
    created_at: datetime
    last_active: datetime
    planning_logs: list[dict]        # raw planning phase outputs per turn (debug)

Message flags

Messages in session.messages carry optional flags beyond role/content:

Flag	Purpose
`is_plan: bool`	Message is a planning phase output (shown as plan card in UI, not text)
`is_compression: bool`	Marker message injected when context compression ran
`is_summary: bool`	A summary message replacing compressed history in `session.context`
`thinking: str \	None`	LLM reasoning captured during a tool-calling turn

Dual-buffer design

Two separate message lists serve different purposes:

Buffer	Purpose	Modified by compression?
`session.messages`	Full display history shown in the UI	Never
`session.context`	What the LLM sees on each call	Yes — old turns replaced with a summary

Tool results, image injections, and assistant messages are appended to both buffers. When compression runs, only session.context is modified.

Note: System messages are not stored in either buffer. They are injected fresh from the current profile on every LLM call via _build_context(). This makes profile switches take effect immediately.

Session store

`InMemorySessionStore`

Simple dict-backed store for testing.

`PgSessionStore` (`navi/core/pg_session_store.py`)

Production store backed by PostgreSQL via asyncpg.

create(profile_id) → new Session
get(session_id) → Session | None
save(session) — serializes with model_dump(mode='json') (required for datetime serialization)
list_all() → sorted by (pinned DESC, last_active DESC)
delete(session_id) → bool
set_pinned(session_id, pinned) → bool
set_name(session_id, name) → bool

Requires DATABASE_URL env variable (e.g. postgresql://user:pass@localhost/navi).

Context compression (`navi/core/compressor.py`)

Keeps the LLM context within the token budget by summarizing old conversation turns.

When it triggers

Two trigger points:

Pre-turn (in run_stream()): before calling the LLM, checks session.context_token_count against the threshold. Compresses if tokens >= num_ctx * threshold.
Post-turn (via CompressionWorker): after StreamEnd, the worker re-checks and compresses if needed.

Config values (settings):

context_compression_enabled: bool = True
context_compression_threshold: float = 0.80 — trigger at 80% of ollama_num_ctx
context_keep_recent: int = 10 — keep last N conversational turns verbatim
context_summary_temperature: float = 0.3

Compression algorithm

compress_context(context, llm, model, temperature, keep_recent):

Partition messages into to_summarize (old turns) and to_keep (recent keep_recent turns).
- A "turn" = one user message + all following assistant/tool messages up to the next user message.
- Tool call groups (assistant + results) are never split across the partition.
- Existing summary messages are folded into the next pass.
Format to_summarize as plain text (tool calls shown as compact previews, max 120 chars for args, max 300 chars for results).
Truncate formatted input to _MAX_SUMMARY_INPUT_CHARS = 12_000 chars.
Call llm.complete() with think=False to produce a bullet-point summary.
Replace to_summarize with a single summary message (role=user, is_summary=True).
Return system_msgs + [summary_msg] + to_keep.

If compression fails, the exception propagates to CompressionWorker, which logs a warning and continues — compression failure is non-fatal.

What is never compressed

session.messages — full history is always intact.
The last context_keep_recent conversational turns.
System messages (never stored in context anyway).

Session file uploads

Files uploaded via POST /sessions/{id}/files are stored in session_files/{session_id}/.

Max size: session_files_max_size_mb (default: 200 MB)
TTL: session_files_ttl_hours (default: 24 hours)
A background cleanup_loop (started on FastAPI startup) deletes stale session directories.
Executable files (.sh, .py, .exe, etc.) are rejected.
Duplicate filenames get a numeric suffix.

When files are uploaded via the UI, their paths are appended to the user message content:

[Uploaded files on disk:
- filename.pdf → session_files/{id}/filename.pdf]

This lets the agent use filesystem or code_exec to access the files.

Debug endpoints

GET /sessions/{id}/context — returns what the LLM actually sees (may differ from messages after compression).
GET /sessions/{id}/planning — returns session.planning_logs: raw planning phase outputs per turn.