|
Add eval system Phase 4 — read endpoints and background runner
REST surface for the debug UI:
- GET /eval/sessions — overview list with eval status / latest avg /
feedback counts (single SQL: sessions ⨝ feedback ⨝ latest run)
- GET /eval/sessions/{id} — session detail with all evaluations
- GET /eval/stats — weekly per-axis means; optional complexity-bucket split
- POST /eval/run — fire-and-forget background eval, returns run_id
- GET /eval/run/{id}, GET /eval/runs — poll progress and history
Pulled the runner loop out of cli into runner.py so both the CLI and
the REST endpoint share the same loop. State for in-flight runs lives
in an in-memory registry (single-process, cleared on restart).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|---|
|
|
| debug/eval/api.py |
|---|
| debug/eval/db.py |
|---|
| debug/eval/runner.py 0 → 100644 |
|---|
| debug/eval/schema.py |
|---|