# Sessions

Session management, dual-buffer design, and context compression.

## Session model (`navi/core/session.py`)

```python
class Session(BaseModel):
    id: str                          # UUID
    profile_id: str                  # active profile
    messages: list[Message]          # full display history — never compressed
    context: list[Message]           # LLM context — may be replaced with summary
    context_token_count: int         # accumulated tokens; reset to 0 after compression
    pinned: bool                     # pinned sessions appear first in sidebar
    name: str | None                 # auto-generated display name (set after first exchange)
    created_at: datetime
    last_active: datetime
    planning_logs: list[dict]        # raw planning phase outputs per turn (debug)
```

## Message flags

Messages in `session.messages` carry optional flags beyond role/content:

| Flag | Purpose |
|---|---|
| `is_plan: bool` | Message is a planning phase output (shown as plan card in UI, not text) |
| `is_compression: bool` | Marker message injected when context compression ran |
| `is_summary: bool` | A summary message replacing compressed history in `session.context` |
| `thinking: str \| None` | LLM reasoning captured during a tool-calling turn |

## Dual-buffer design

Two separate message lists serve different purposes:

| Buffer | Purpose | Modified by compression? |
|---|---|---|
| `session.messages` | Full display history shown in the UI | Never |
| `session.context` | What the LLM sees on each call | Yes — old turns replaced with a summary |

Tool results, image injections, and assistant messages are appended to **both** buffers. When compression runs, only `session.context` is modified.

**Note:** System messages are **not stored** in either buffer. They are injected fresh from the current profile on every LLM call via `_build_context()`. This makes profile switches take effect immediately.

## Session store

### `InMemorySessionStore`
Simple dict-backed store for testing.

### `PgSessionStore` (`navi/core/pg_session_store.py`)
Production store backed by PostgreSQL via asyncpg.

- `create(profile_id)` → new `Session`
- `get(session_id)` → `Session | None`
- `save(session)` — serializes with `model_dump(mode='json')` (required for datetime serialization)
- `list_all()` → sorted by `(pinned DESC, last_active DESC)`
- `delete(session_id)` → `bool`
- `set_pinned(session_id, pinned)` → `bool`
- `set_name(session_id, name)` → `bool`

Requires `DATABASE_URL` env variable (e.g. `postgresql://user:pass@localhost/navi`).

---

## Context compression (`navi/core/compressor.py`)

Keeps the LLM context within the token budget by summarizing old conversation turns.

### When it triggers

Two trigger points:

1. **Pre-turn** (in `run_stream()`): before calling the LLM, checks `session.context_token_count` against the threshold. Compresses if `tokens >= num_ctx * threshold`.
2. **Post-turn** (via `CompressionWorker`): after `StreamEnd`, the worker re-checks and compresses if needed.

Config values (`settings`):
- `context_compression_enabled: bool = True`
- `context_compression_threshold: float = 0.80` — trigger at 80% of `ollama_num_ctx`
- `context_keep_recent: int = 10` — keep last N conversational turns verbatim
- `context_summary_temperature: float = 0.3`

### Compression algorithm

`compress_context(context, llm, model, temperature, keep_recent)`:

1. Partition messages into `to_summarize` (old turns) and `to_keep` (recent `keep_recent` turns).
   - A "turn" = one user message + all following assistant/tool messages up to the next user message.
   - Tool call groups (assistant + results) are never split across the partition.
   - Existing summary messages are folded into the next pass.
2. Format `to_summarize` as plain text (tool calls shown as compact previews, max 120 chars for args, max 300 chars for results).
3. Truncate formatted input to `_MAX_SUMMARY_INPUT_CHARS = 12_000` chars.
4. Call `llm.complete()` with `think=False` to produce a bullet-point summary.
5. Replace `to_summarize` with a single summary message (`role=user`, `is_summary=True`).
6. Return `system_msgs + [summary_msg] + to_keep`.

If compression fails, the exception propagates to `CompressionWorker`, which logs a warning and continues — compression failure is non-fatal.

### What is never compressed

- `session.messages` — full history is always intact.
- The last `context_keep_recent` conversational turns.
- System messages (never stored in context anyway).

---

## Session file uploads

Files uploaded via `POST /sessions/{id}/files` are stored in `session_files/{session_id}/`.

- Max size: `session_files_max_size_mb` (default: 200 MB)
- TTL: `session_files_ttl_hours` (default: 24 hours)
- A background `cleanup_loop` (started on FastAPI startup) deletes stale session directories.
- Executable files (`.sh`, `.py`, `.exe`, etc.) are rejected.
- Duplicate filenames get a numeric suffix.

When files are uploaded via the UI, their paths are appended to the user message content:
```
[Uploaded files on disk:
- filename.pdf → session_files/{id}/filename.pdf]
```

This lets the agent use `filesystem` or `code_exec` to access the files.

---

## Debug endpoints

- `GET /sessions/{id}/context` — returns what the LLM actually sees (may differ from `messages` after compression).
- `GET /sessions/{id}/planning` — returns `session.planning_logs`: raw planning phase outputs per turn.
