| 2026-06-12 |
feat: multipart/form-data image upload endpoint with inline pipeline
...
- Add POST /api/v1/ingest/with-images accepting metadata (JSON Form) + images (UploadFile list)
- Stream images to temp storage and run pipeline inline with fresh AsyncSessionLocal
- Mark raw status=processing immediately to prevent queue worker race condition
- Add process_local_file() to ImageDownloader for handling already-downloaded images
- Split pipeline image processing: _stage_process_uploaded_images vs _stage_process_remote_images
- Process uploaded images sequentially to avoid SQLAlchemy concurrent flush errors
- Commit pipeline session explicitly after inline processing
- Clean up temp files in finally block regardless of pipeline outcome
- Add python-multipart dependency for FastAPI multipart parsing
- Update README with multipart endpoint docs and examples
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
fix: pipeline type coercion — bool, datetime, enum mappings
...
- Add _to_bool() for boolean fields (has_balcony, has_loggia, etc.)
- Add _to_datetime() for publish_date / archived_at
- Add _to_enum() with Russian→English mappings for all DB enums:
building_type, renovation_status, deal_type, layout,
bathroom_type, parking_type, heating_type, window_view,
metro_distance_type, listing_status
- Change pipeline except blocks to re-raise exceptions
so worker handles rollback + _mark_failed properly
- Fixes stuck 'processing' jobs caused by SQL type errors
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: dockerize app, add structured logging, fix rate limiter
...
Changes:
- Add Dockerfile with python:3.12-slim, alembic migrations, uvicorn
- Add .dockerignore
- Update docker-compose.yml with app service, port 8020, external Ollama
- Configure alembic/env.py to read DATABASE_URL from env
- Update .env.example with port 8020, Ollama host 192.168.1.75, gemma4 model
- Fix slowapi rate limiter: sync key_func instead of async
- Add structured JSON logging (structlog) to ingest endpoint, pipeline stages
- Fix logging output via logging.basicConfig for Docker stdout
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
fix: code review critical and high issues
...
- tenacity+structlog: replace before_sleep_log with structlog-compatible lambda
to prevent TypeError on retry
- NormalizedProperty: filter AI response dict by allowed dataclass fields before
unpacking to avoid TypeError on unknown keys
- property_pipeline: remove duplicate update_status(failed) from _stage_normalize
- security: add URL validator (SSRF protection) for ImageDownloader and archive-check
- ai prompts: replace raw <user_data> tags with JSON-serialized payload
to mitigate prompt injection
- queue_worker: wrap _process_one in try/except so DB errors don't kill the loop
- image processing: parallelize with asyncio.gather + Semaphore(3)
- ai services: unify OllamaFatalError handling — all propagate instead of swallow
- router_properties: catch only pydantic.ValidationError/ValueError in ingest,
let infrastructure errors return 500
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: classify AI errors — ValidationError→invalid, fatal/network→failed
...
Item 6 from REVIEW_FOLLOWUP:
- ai_normalizer: pydantic.ValidationError returns is_real_estate=False (invalid)
OllamaFatalError / unexpected exceptions now raise → pipeline status=failed
- ai_enricher: ValidationError returns None (skip enrichment), fatal returns None,
unexpected raise → pipeline status=failed
- pipeline._stage_enrich: propagate OllamaRetryableError so pipeline can mark failed
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
| 2026-06-11 |
feat: implement review items 8-14
...
- Soft-delete/archive for listings: archived_at column + archive-check endpoint
- Rate limiting on /ingest: slowapi with 60/minute per source_slug
- Prometheus metrics: /metrics endpoint + custom counters/histograms
- Graceful shutdown: track active jobs in app.state, wait up to 30s
- Prompt injection protection: wrap user data in <user_data> XML tags
- Image download size limit: 50MB max with httpx streaming
- Raw data cleanup: admin endpoint to delete completed raw data older than N days
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: implement review items 1-7
...
- Decompose PropertyPipeline into 8 explicit stages with PipelineContext
- Add tenacity retry (3 attempts, exponential backoff) to OllamaClient and ImageDownloader
- Add simple in-memory circuit breaker for Ollama calls
- Resize images to 1024px before base64 encoding for vision model
- Add /health endpoint (DB, Ollama, disk checks)
- Add DB performance indexes + alembic migration
- Classify AI errors: OllamaRetryableError vs OllamaFatalError
- Add strict PayloadSchema validation for ingest endpoint
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|
feat: core pipeline + FastAPI API (Phases 0-6)
...
Implemented the full VMK Data Collector foundation and processing pipeline:
- Config, logging, exception hierarchy
- DB models (listings, raw data, images, snapshots, enrichments, custom fields)
- Alembic async migrations
- Repositories with upsert/snapshot support
- Domain entities and Pydantic schemas
- Ollama AI client with mock mode
- AI normalizer, image analyzer, enricher
- Image downloader with SHA-256 dedup
- PropertyPipeline: raw -> AI validate -> upsert/snapshot -> images -> enrich
- FastAPI app with /api/v1/ingest endpoint
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago
|