|
Add context compression: rolling summarization when context fills up
Mechanism: - After streaming ends, if context_tokens >= threshold (80% of num_ctx), compress old turns into a summary message using the same LLM - Partition: keep system msg + last N turns verbatim (default 6); everything older goes to the summarizer - Tool call groups (assistant + tool results) never split across boundary - Existing summary messages folded into new compression pass — no stack growth - Summary stored as Message(role=user, is_summary=True) after system msg - On failure: logged, session left unchanged (non-fatal) New files: - navi/core/compressor.py: should_compress, partition_messages, compress_session (pure logic, testable without agent) New config (navi/config.py): - context_compression_enabled: bool = True - context_compression_threshold: float = 0.80 - context_keep_recent: int = 6 - context_summary_temperature: float = 0.3 New agent event: ContextCompressed(messages_before, messages_after) Message.is_summary: bool field marks compressed history blocks Client: - context_compressed WS event → subtle inline notice in message list - loadHistory: is_summary messages rendered as collapsible summary cards - style.css: .summary-card, .compression-notice Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
|---|
|
|
| client/js/app.js |
|---|
| client/js/chat.js |
|---|
| client/style.css |
|---|
| navi/api/websocket.py |
|---|
| navi/config.py |
|---|
| navi/core/__init__.py |
|---|
| navi/core/agent.py |
|---|
| navi/core/compressor.py 0 → 100644 |
|---|
| navi/llm/base.py |
|---|