| 2026-06-12 |
docs: update README with Docker guide and parser API reference
...
- Add Docker quick-start instructions
- Add manual dev setup instructions
- Document POST /api/v1/ingest endpoint for parsers
- Add payload field reference table
- Add response codes and processing statuses
- Add curl examples
- Add configuration reference table
- Add architecture diagram
- Add logging/monitoring section
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: dockerize app, add structured logging, fix rate limiter
...
Changes:
- Add Dockerfile with python:3.12-slim, alembic migrations, uvicorn
- Add .dockerignore
- Update docker-compose.yml with app service, port 8020, external Ollama
- Configure alembic/env.py to read DATABASE_URL from env
- Update .env.example with port 8020, Ollama host 192.168.1.75, gemma4 model
- Fix slowapi rate limiter: sync key_func instead of async
- Add structured JSON logging (structlog) to ingest endpoint, pipeline stages
- Fix logging output via logging.basicConfig for Docker stdout
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
fix: code review critical and high issues
...
- tenacity+structlog: replace before_sleep_log with structlog-compatible lambda
to prevent TypeError on retry
- NormalizedProperty: filter AI response dict by allowed dataclass fields before
unpacking to avoid TypeError on unknown keys
- property_pipeline: remove duplicate update_status(failed) from _stage_normalize
- security: add URL validator (SSRF protection) for ImageDownloader and archive-check
- ai prompts: replace raw <user_data> tags with JSON-serialized payload
to mitigate prompt injection
- queue_worker: wrap _process_one in try/except so DB errors don't kill the loop
- image processing: parallelize with asyncio.gather + Semaphore(3)
- ai services: unify OllamaFatalError handling — all propagate instead of swallow
- router_properties: catch only pydantic.ValidationError/ValueError in ingest,
let infrastructure errors return 500
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: background queue worker for async pipeline processing
...
- Endpoint /ingest now only validates payload, creates raw_data with
status=pending, commits and returns 202 (no longer blocks on AI).
- QueueWorker polls DB for pending jobs every 1s, grabs one with
FOR UPDATE SKIP LOCKED, marks it processing, runs PropertyPipeline.
- PipelineFactory extracted from deps.py for reuse by both HTTP deps
and the background worker.
- Lifespan starts QueueWorker as asyncio.Task; on shutdown signals
stop_event, awaits worker termination (60s timeout) before closing
Ollama client and active_jobs.
- Worker checks pipeline result status and logs completed/invalid/failed
appropriately. Unhandled exceptions mark raw_data failed explicitly.
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: classify AI errors — ValidationError→invalid, fatal/network→failed
...
Item 6 from REVIEW_FOLLOWUP:
- ai_normalizer: pydantic.ValidationError returns is_real_estate=False (invalid)
OllamaFatalError / unexpected exceptions now raise → pipeline status=failed
- ai_enricher: ValidationError returns None (skip enrichment), fatal returns None,
unexpected raise → pipeline status=failed
- pipeline._stage_enrich: propagate OllamaRetryableError so pipeline can mark failed
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
| 2026-06-11 |
feat: implement review items 8-14
...
- Soft-delete/archive for listings: archived_at column + archive-check endpoint
- Rate limiting on /ingest: slowapi with 60/minute per source_slug
- Prometheus metrics: /metrics endpoint + custom counters/histograms
- Graceful shutdown: track active jobs in app.state, wait up to 30s
- Prompt injection protection: wrap user data in <user_data> XML tags
- Image download size limit: 50MB max with httpx streaming
- Raw data cleanup: admin endpoint to delete completed raw data older than N days
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: implement review items 1-7
...
- Decompose PropertyPipeline into 8 explicit stages with PipelineContext
- Add tenacity retry (3 attempts, exponential backoff) to OllamaClient and ImageDownloader
- Add simple in-memory circuit breaker for Ollama calls
- Resize images to 1024px before base64 encoding for vision model
- Add /health endpoint (DB, Ollama, disk checks)
- Add DB performance indexes + alembic migration
- Classify AI errors: OllamaRetryableError vs OllamaFatalError
- Add strict PayloadSchema validation for ingest endpoint
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: core pipeline + FastAPI API (Phases 0-6)
...
Implemented the full VMK Data Collector foundation and processing pipeline:
- Config, logging, exception hierarchy
- DB models (listings, raw data, images, snapshots, enrichments, custom fields)
- Alembic async migrations
- Repositories with upsert/snapshot support
- Domain entities and Pydantic schemas
- Ollama AI client with mock mode
- AI normalizer, image analyzer, enricher
- Image downloader with SHA-256 dedup
- PropertyPipeline: raw -> AI validate -> upsert/snapshot -> images -> enrich
- FastAPI app with /api/v1/ingest endpoint
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
|
|