root/navi-1

Fork: 0

root / navi-1

2026-04-29	7b672c3 Browse files » Remove SQLite legacy support ... SQLite is no longer supported; PostgreSQL is now required. - Delete navi/core/sqlite_session_store.py - Delete navi/memory/sqlite_store.py - Remove SqliteSessionStore from navi/core/__init__.py exports - deps.py: drop SQLite fallback, raise RuntimeError if DATABASE_URL missing - config.py: remove db_path setting - pyproject.toml & requirements.txt: drop aiosqlite dependency - .gitignore: remove navi.db entry - tech_debt_review_2026-04-29.md: mark #8 as REMOVED Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 29 Apr
2026-04-29	098401a Browse files » Stability fixes batch — tech debt review 2026-04-29 ... Critical: - Concurrent WS run race guard (#1) - Tool task cancellation on generator teardown (#2) - StopAsyncIteration kills fallback chain (#3) - Session loading race with _lastLoadId guard (#4) - ContentCard .match() crash on non-string result (#5) - Image data type guard in buildMessageList (#6) High: - Cap WS replay buffer at 500 events (#7) - Deduplicate memory extraction task with asyncio.Lock (#9) - TTL-based fallback blacklisting (5 min) (#10) - Subagent tool exception isolation (#11) - Inline image size/count validation on WS (#12) - Clean up orphaned file on DB insert failure (#13) - Deep watch streamingMsg for auto-scroll (#14) - WS_SCHEME wss:// support for HTTPS (#15) - Sending guard against duplicate message sends (#16) - Global unhandledrejection listener in API layer (#17) Medium: - Cap planning_logs at 20 entries (#22) - Store cleanup_loop task reference (#23) - BaseException → Exception in _run_with_sentinel (#24) - Propagate SystemExit in agent loop (#25) - Configurable output_reserve_tokens (#26) - Always reloadSession on session_sync (#30) - FIFO queue for confirm dialogs (#31) - Reset body.overflow on ImageLightbox unmount (#32) - try/finally in fallback copy (#33) - _isConnecting guard in WS send() (#34) Low: - Lazy-init deps.py singletons (#36) - Replace __import__ with direct imports (#38) - Preserve token count 0 in ollama.py (#39) - Clear orphaned streamingMsg on reconnect reload (#43) - Escape single quote in UserMessage (#44) - Polyfill-free findLast replacement (#48) - Match <table> tags with attributes in markdown (#49) - Attach copy buttons only when msg.done (#50) - Fix hasMeta falsy-0 bug (#53) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 29 Apr
2026-04-28	c6ad794 Browse files » Add SVG/HTML/XML tag formatting rule to persona ... Prevents model from generating doubled/escaped tags like <<svgsvg> by explicitly instructing single-angle-bracket markup in code output. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
	cabfce8 Browse files » Fix system prompt leakage into chat history; polish content cards ... Backend: - websocket.py + agent.py: separate user-visible display_message from LLM user_message. System hints (image/file attachments) no longer leak into session.messages and appear after page reload. - main.py: add ensure_tables() on startup so session_content table is created before first publish. - profiles: add kimi-k2.6:cloud to all model lists as fallback. Frontend: - ContentCard.vue: remove border-radius, add scrollbar styles, fix metadata fallback parsing so cards survive page reload. - content-viewers/*.html: add matching scrollbar styles. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
	b88b7c0 Browse files » Add content hosting system with inline viewers ... Backend: - Add navi/content/ directory for published files - Add content_store.py with publish/list/delete/cleanup functions - Add content_publish tool for publishing files as viewable content - Add /content static file mount in main.py - Add /content-viewers mount for viewer pages - Extend ToolEvent with metadata field - Forward metadata through websocket tool_call events - Update Agent to include metadata in ToolEvent Frontend: - Add ContentCard.vue component for displaying published content - Add viewer pages: stl.html (Three.js), svg.html, html.html, pdf.html - Update AssistantMessage.vue to render ContentCard for content_publish - Update chat store to preserve metadata in tool cards - Update websocket protocol docs with metadata field Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
	dba307e Browse files » Update memory docs to reflect pgvector + dedicated embedding backend ... - Add dedicated embedding backend section (.env variables) - Add backfill_embeddings script documentation - Update storage methods: upsert_fact generates embeddings, search_facts uses vector search with cosine distance fallback to ILIKE - Update extractor process: tool calls/results in transcript, source/confidence - Replace memory_search/memory_forget references with unified memory tool Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
	cbb1e5d Browse files » Add dedicated CPU embedding server for memory backfill ... - Install Ollama CPU-only on 192.168.1.168 server - Pull nomic-embed-text:latest on server - Create systemd service ollama-embed.service (0.0.0.0:11434) - Add embedding_ollama_host / embedding_ollama_api_key to config.py - Update deps.py to build separate embedding backend when host configured - Update backfill_embeddings.py to use dedicated embedding backend - Add _generate_embeddings batch helper and backfill_embeddings to store.py - Backfilled 119 existing facts with embeddings Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
	b5f2793 Browse files » Enrich memory extractor with tool calls/results in transcript ... - _EXTRACT_SYSTEM now explains 4 transcript entry types and instructs LLM to trust tool results over chat, return source/source_context - _extract_facts builds tool_call_map, appends [Tool call] and [Tool result] lines with truncation (500/200 chars) - Transcript capped at 12k chars (head+tail, drop middle) - Parse source/source_context from LLM response; map confidence: tool_call/auto_discovery=95, user_explicit=90, default=70 - Add TODO comment about deferred semantic deduplication Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
	56fa7ac Browse files » Document pgvector migration in memory system ... - Add PostgreSQL vs SQLite comparison table - Document migrate_pgvector.py usage - Update storage section to mention asyncpg + aiosqlite Eugene Sukhodolskiy committed on 28 Apr
	aeac902 Browse files » Add pgvector migration script for memory_facts ... - ALTER TABLE memory_facts: embedding, source, confidence, expires_at, source_context - CREATE INDEX: hnsw(embedding), expires, source+category - Safe to run multiple times (IF NOT EXISTS) - Reads DATABASE_URL from settings Eugene Sukhodolskiy committed on 28 Apr
	c874cbe Browse files » Wire pgvector semantic search into memory system ... - Add vector(768) column + HNSW index to memory_facts - Add LLMBackend.embed() with Ollama + fallback implementation - MemoryStore: cosine-distance search with ILIKE fallback - New memory tool params: source, confidence, expires_days, source_context - Update extractor, sqlite_store, deps wiring - Add pgvector to requirements Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
	a643119 Browse files » Slim eval rubric to 3 levels with one reference per axis ... Five anchors per axis (10/30/50/75/100, even after the earlier shift) were both redundant and amplified the model's snap-to-round-numbers prior. Cut to three level descriptions per axis (weak / typical / strong) with a single non-round reference score (53) on `typical`. Re-state the scale as open-ended with no upper bound to make the "future Navi may exceed past ceilings" intent explicit. - rubric_v1.yaml: anchors → levels (5 → 3 per axis), reference score 53 only on typical, scale framed as fully open-ended. - judge.py: render_rubric_for_prompt walks the new `levels` shape and surfaces the reference score only when present. - expert prompts (strict_critic, pragmatist, tech_lead): drop the example output blocks (their concrete numbers were misleading the judges), rewrite the scale paragraph for the new structure. - schema.py: docstring no longer pins ">100" as the open-scale marker. User intent: dynamics, not absolute scores. Weekly aggregates over three averaged experts smooth individual snap-to-5 into continuous trends; the rubric is a calibration aid, not a grading ceiling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
	9d96249 Browse files » Fight rubric-anchor snapping in eval judges ... Judges were clustering scores onto the rubric's round anchor values (30, 50, 75, 100) instead of producing fine-grained continuous scores, which made small differences between sessions invisible. - rubric_v1.yaml: shift anchors off round numbers (33/51/77/102), reframe the scale as open-ended integers ≥ 0 with illustrative level descriptions, and tell judges explicitly not to round to anchors. - expert prompts (strict_critic, pragmatist, tech_lead): mirror the scale framing and add an example output with deliberately non-round scores between anchors. - judge.py: bump expert temperature 0.2 → 0.5 so the judges produce more varied, non-deterministic scores. Old v1 evaluations in the DB are not comparable to new ones; user intends to wipe and re-run from scratch, so versions are not bumped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
	d9e9f4d Browse files » Stop image_view hallucinations on inline-attached images ... The model was inventing fake paths/URLs (e.g. files.oaiusercontent.com, /home/ubuntu/navi-1/input_file_0.png) and calling image_view on them when the user attached an image directly in chat — the image was already in the multimodal context, but the tool description and lack of a signal pushed the model to "load" it anyway. - websocket.py: when a user message has inline images, append a brief note that they are already in context. - image_view.py: soften the description — keep proactive use for paths and URLs the model genuinely cannot see, but tell it inline images don't need this tool. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 28 Apr
2026-04-26	3da9773 Browse files » Rewrite eval_system.md as user guide; preserve original spec as eval_system_design.md ... - docs/eval_system.md: replaced stale spec with current user guide covering UI tabs, CLI, rubric, experts, versioning, API - docs/eval_system_design.md: preserved original design spec for reference Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 26 Apr
	307f639 Browse files » Add eval system Phase 5 — debug UI ... Self-contained SPA at /debug/eval (route already wired in 8e0eed6). Single index.html in the existing debug/ style — vanilla JS, embedded CSS, no framework, no build step. Four tabs: - Sessions — filterable table (profile / status / limit), eval status pill, headline avg scores, click-through to detail - Detail — session metadata + every stored eval run, axes laid out as axis × expert grids with inline averages, expert comments, button to re-evaluate this single session - Stats — weekly per-axis means table, optional complexity-bucket split - Run — form to trigger any scope (unevaluated / single / all), live status panel polling /eval/run/{id} every 2.5s, run history with click-to-attach Hash routing: #detail/<session_id> deep-links to a session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 26 Apr
	8e0eed6 Browse files » changed llm Eugene Sukhodolskiy committed on 26 Apr
	8d5c351 Browse files » Add eval system Phase 4 — read endpoints and background runner ... REST surface for the debug UI: - GET /eval/sessions — overview list with eval status / latest avg / feedback counts (single SQL: sessions ⨝ feedback ⨝ latest run) - GET /eval/sessions/{id} — session detail with all evaluations - GET /eval/stats — weekly per-axis means; optional complexity-bucket split - POST /eval/run — fire-and-forget background eval, returns run_id - GET /eval/run/{id}, GET /eval/runs — poll progress and history Pulled the runner loop out of cli into runner.py so both the CLI and the REST endpoint share the same loop. State for in-flight runs lives in an in-memory registry (single-process, cleared on restart). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 26 Apr
	864261a Browse files » Add eval system Phase 3 — judge runner end to end ... Fills in the stubs from Phase 2: - judge.render_session: full transcript with tool_call/tool_result folding, reactions inlined per assistant block, planning_logs appendix, no compression-summary substitution - judge.run_expert: real LLM call, fence-tolerant JSON parse, single retry with corrective nudge on schema or parse error - judge.evaluate_session: asyncio.gather across the three experts - db.EvalDB: insert_evaluation_run (txn), list_evaluations, evaluated_session_ids, feedback_by_index helper - cli `run` (filters: --session, --since, --limit, --re-evaluate-all, --dry-run, --model, --backend) and `show` (groups by eval_run_id, prints per-expert axes plus averaged scores) Verified end-to-end against a real 10-message secretary session: all three experts returned valid JSON first try; spread between strict critic and the others surfaced as expected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 26 Apr
	e477127 Browse files » Add eval system Phase 2 — rubric, expert prompts, judge skeleton ... Drafts the v1 rubric (7 axes, anchors at 10/30/50/75/100, open scale), three independent expert prompts (strict_critic / pragmatist / tech_lead) that all return the same JSON shape, and the orchestration scaffolding: schema.py (pydantic models), judge.py (rubric loader, score averaging, fence-tolerant JSON parser, new_run_metadata), cli.py with argparse for run / show / stats. Real LLM calls and transcript rendering land in Phase 3 — the stubs raise NotImplementedError. `python -m debug.eval` works as the entry point. Anchor `examples` are left empty for now; user fills them with real session_ids later without bumping rubric_version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 26 Apr
	5817fb9 Browse files » Add eval system Phase 1 — message feedback signal ... Spec at docs/eval_system.md describes the full LLM-as-judge plan; this commit lands only the in-app feedback layer: - debug/eval/ Python package with EvalDB (asyncpg) and FastAPI router exposing /eval/feedback (set / clear / list) - message_feedback postgres table keyed by (session_id, message_index) - thumbs up / down on each completed assistant block in the webclient, optimistic update with rollback on failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 26 Apr
	b5b11be Browse files » changed llm & new ollama param Eugene Sukhodolskiy committed on 26 Apr
2026-04-25	8a6c198 Browse files » changed model Eugene Sukhodolskiy committed on 25 Apr
	b016a54 Browse files » Strengthen todo progress discipline Eugene Sukhodolskiy committed on 25 Apr
	d878bb0 Browse files » Add structured planning review and adaptive depth Eugene Sukhodolskiy committed on 25 Apr
	c2a2186 Browse files » Tune reflect autonomy guidance Eugene Sukhodolskiy committed on 25 Apr
	e67e7a5 Browse files » Improve compression and memory prompts Eugene Sukhodolskiy committed on 25 Apr
	9f16714 Browse files » Remove tool-call-like examples from prompts Eugene Sukhodolskiy committed on 25 Apr
	557ce4e Browse files » Improve prompt resilience and project context use Eugene Sukhodolskiy committed on 25 Apr
	90118cb Browse files » Fix profile prompt inconsistencies Eugene Sukhodolskiy committed on 25 Apr