root/navi-1

Fork: 0

root / navi-1

History for navi-1 / debug / eval / runner.py

2026-04-26	8d5c351 Browse files » Add eval system Phase 4 — read endpoints and background runner ... REST surface for the debug UI: - GET /eval/sessions — overview list with eval status / latest avg / feedback counts (single SQL: sessions ⨝ feedback ⨝ latest run) - GET /eval/sessions/{id} — session detail with all evaluations - GET /eval/stats — weekly per-axis means; optional complexity-bucket split - POST /eval/run — fire-and-forget background eval, returns run_id - GET /eval/run/{id}, GET /eval/runs — poll progress and history Pulled the runner loop out of cli into runner.py so both the CLI and the REST endpoint share the same loop. State for in-flight runs lives in an in-memory registry (single-process, cleared on restart). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Eugene Sukhodolskiy committed on 26 Apr

2026-04-26

8d5c351
Browse files »

Add eval system Phase 4 — read endpoints and background runner ...

REST surface for the debug UI:
- GET /eval/sessions  — overview list with eval status / latest avg /
  feedback counts (single SQL: sessions ⨝ feedback ⨝ latest run)
- GET /eval/sessions/{id} — session detail with all evaluations
- GET /eval/stats — weekly per-axis means; optional complexity-bucket split
- POST /eval/run — fire-and-forget background eval, returns run_id
- GET /eval/run/{id}, GET /eval/runs — poll progress and history

Pulled the runner loop out of cli into runner.py so both the CLI and
the REST endpoint share the same loop. State for in-flight runs lives
in an in-memory registry (single-process, cleared on restart).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Eugene Sukhodolskiy committed on 26 Apr