|
Add explicit output token budget for summarizer (context_summary_max_tokens)
Previously there was no num_predict set for the summarization LLM call, so Ollama used its server default (often 128 tokens — very short summaries). - Add max_tokens param to LLMBackend.complete() and OllamaBackend (→ num_predict) - Add context_summary_max_tokens: int = 1024 to config - Thread it through compress_context() and CompressionWorker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
|---|
|
|
| navi/config.py |
|---|
| navi/core/compressor.py |
|---|
| navi/llm/base.py |
|---|
| navi/llm/ollama.py |
|---|
| navi/workers/compressor.py |
|---|