Add explicit output token budget for summarizer (context_summary_max

Fork: 0

root / navi-1

Browse code Add explicit output token budget for summarizer (context_summary_max_tokens) Previously there was no num_predict set for the summarization LLM call, so Ollama used its server default (often 128 tokens — very short summaries). - Add max_tokens param to LLMBackend.complete() and OllamaBackend (→ num_predict) - Add context_summary_max_tokens: int = 1024 to config - Thread it through compress_context() and CompressionWorker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> feature/navi-code master vmkdemo
1 parent 96548a1 commit 4b647631191f741d5483a0c4dbfe3e5a2c4cb245 Eugene Sukhodolskiy authored on 15 Apr

Browse code

Add explicit output token budget for summarizer (context_summary_max_tokens)

Previously there was no num_predict set for the summarization LLM call,
so Ollama used its server default (often 128 tokens — very short summaries).

- Add max_tokens param to LLMBackend.complete() and OllamaBackend (→ num_predict)
- Add context_summary_max_tokens: int = 1024 to config
- Thread it through compress_context() and CompressionWorker

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feature/navi-code master vmkdemo

1 parent 96548a1 commit 4b647631191f741d5483a0c4dbfe3e5a2c4cb245

Eugene Sukhodolskiy authored on 15 Apr

Patch

Unified Split

Showing 5 changed files

Ignore Space Show notes View navi/config.py

Ignore Space Show notes View navi/core/compressor.py

Ignore Space Show notes View navi/llm/base.py

Ignore Space Show notes View navi/llm/ollama.py

Ignore Space Show notes View navi/workers/compressor.py

Show line notes below