Add explicit output token budget for summarizer (context_summary_max_tokens)
Previously there was no num_predict set for the summarization LLM call,
so Ollama used its server default (often 128 tokens — very short summaries).

- Add max_tokens param to LLMBackend.complete() and OllamaBackend (→ num_predict)
- Add context_summary_max_tokens: int = 1024 to config
- Thread it through compress_context() and CompressionWorker

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 96548a1 commit 4b647631191f741d5483a0c4dbfe3e5a2c4cb245
@Eugene Sukhodolskiy Eugene Sukhodolskiy authored on 15 Apr
Showing 5 changed files
View
navi/config.py
View
navi/core/compressor.py
View
navi/llm/base.py
View
navi/llm/ollama.py
View
navi/workers/compressor.py