Add eval system Phase 4 — read endpoints and background runner
...
REST surface for the debug UI:
- GET /eval/sessions — overview list with eval status / latest avg /
feedback counts (single SQL: sessions ⨝ feedback ⨝ latest run)
- GET /eval/sessions/{id} — session detail with all evaluations
- GET /eval/stats — weekly per-axis means; optional complexity-bucket split
- POST /eval/run — fire-and-forget background eval, returns run_id
- GET /eval/run/{id}, GET /eval/runs — poll progress and history
Pulled the runner loop out of cli into runner.py so both the CLI and
the REST endpoint share the same loop. State for in-flight runs lives
in an in-memory registry (single-process, cleared on restart).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
on 26 Apr