| 2026-06-12 |
feat: background queue worker for async pipeline processing
...
- Endpoint /ingest now only validates payload, creates raw_data with
status=pending, commits and returns 202 (no longer blocks on AI).
- QueueWorker polls DB for pending jobs every 1s, grabs one with
FOR UPDATE SKIP LOCKED, marks it processing, runs PropertyPipeline.
- PipelineFactory extracted from deps.py for reuse by both HTTP deps
and the background worker.
- Lifespan starts QueueWorker as asyncio.Task; on shutdown signals
stop_event, awaits worker termination (60s timeout) before closing
Ollama client and active_jobs.
- Worker checks pipeline result status and logs completed/invalid/failed
appropriately. Unhandled exceptions mark raw_data failed explicitly.
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: classify AI errors — ValidationError→invalid, fatal/network→failed
...
Item 6 from REVIEW_FOLLOWUP:
- ai_normalizer: pydantic.ValidationError returns is_real_estate=False (invalid)
OllamaFatalError / unexpected exceptions now raise → pipeline status=failed
- ai_enricher: ValidationError returns None (skip enrichment), fatal returns None,
unexpected raise → pipeline status=failed
- pipeline._stage_enrich: propagate OllamaRetryableError so pipeline can mark failed
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
| 2026-06-11 |
feat: implement review items 8-14
...
- Soft-delete/archive for listings: archived_at column + archive-check endpoint
- Rate limiting on /ingest: slowapi with 60/minute per source_slug
- Prometheus metrics: /metrics endpoint + custom counters/histograms
- Graceful shutdown: track active jobs in app.state, wait up to 30s
- Prompt injection protection: wrap user data in <user_data> XML tags
- Image download size limit: 50MB max with httpx streaming
- Raw data cleanup: admin endpoint to delete completed raw data older than N days
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: implement review items 1-7
...
- Decompose PropertyPipeline into 8 explicit stages with PipelineContext
- Add tenacity retry (3 attempts, exponential backoff) to OllamaClient and ImageDownloader
- Add simple in-memory circuit breaker for Ollama calls
- Resize images to 1024px before base64 encoding for vision model
- Add /health endpoint (DB, Ollama, disk checks)
- Add DB performance indexes + alembic migration
- Classify AI errors: OllamaRetryableError vs OllamaFatalError
- Add strict PayloadSchema validation for ingest endpoint
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: core pipeline + FastAPI API (Phases 0-6)
...
Implemented the full VMK Data Collector foundation and processing pipeline:
- Config, logging, exception hierarchy
- DB models (listings, raw data, images, snapshots, enrichments, custom fields)
- Alembic async migrations
- Repositories with upsert/snapshot support
- Domain entities and Pydantic schemas
- Ollama AI client with mock mode
- AI normalizer, image analyzer, enricher
- Image downloader with SHA-256 dedup
- PropertyPipeline: raw -> AI validate -> upsert/snapshot -> images -> enrich
- FastAPI app with /api/v1/ingest endpoint
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
|
|