Add context compression: rolling summarization when context fills up
Mechanism:
- After streaming ends, if context_tokens >= threshold (80% of num_ctx),
  compress old turns into a summary message using the same LLM
- Partition: keep system msg + last N turns verbatim (default 6);
  everything older goes to the summarizer
- Tool call groups (assistant + tool results) never split across boundary
- Existing summary messages folded into new compression pass — no stack growth
- Summary stored as Message(role=user, is_summary=True) after system msg
- On failure: logged, session left unchanged (non-fatal)

New files:
- navi/core/compressor.py: should_compress, partition_messages,
  compress_session (pure logic, testable without agent)

New config (navi/config.py):
- context_compression_enabled: bool = True
- context_compression_threshold: float = 0.80
- context_keep_recent: int = 6
- context_summary_temperature: float = 0.3

New agent event: ContextCompressed(messages_before, messages_after)
Message.is_summary: bool field marks compressed history blocks

Client:
- context_compressed WS event → subtle inline notice in message list
- loadHistory: is_summary messages rendered as collapsible summary cards
- style.css: .summary-card, .compression-notice

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 72f5a6f commit 802c186a667ff187d584c999741e2431f5b44f3d
@Eugene Sukhodolskiy Eugene Sukhodolskiy authored on 8 Apr
Showing 9 changed files
View
client/js/app.js
View
client/js/chat.js
View
client/style.css
View
navi/api/websocket.py
View
navi/config.py
View
navi/core/__init__.py
View
navi/core/agent.py
View
navi/core/compressor.py 0 → 100644
View
navi/llm/base.py