Плейбук поэтапной разработки. Каждая фаза содержит конкретные файлы, классы и функции.
git init (уже есть).gitignore:
.env __pycache__/ *.pyc .pytest_cache/ .mypy_cache/ *.egg-info/ dist/ build/ alembic/versions/*.pyc .coverage htmlcov/ /var/lib/vmk/images/
docker compose up -d postgresdocker logs vmk_postgres → database system is readymkdir -p src/vmk_data_collector/{core,api/v1,domain,schemas,services,db/repositories,models}
mkdir -p tests/{unit,integration}
mkdir -p alembic/versions
mkdir -p docs
pyproject.tomlСоздать /home/gmikcon/Projects/vmk/data_collector/pyproject.toml:
.env.exampleСоздать /home/gmikcon/Projects/vmk/data_collector/.env.example:
DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/vmk_data DATABASE_POOL_SIZE=20 DATABASE_MAX_OVERFLOW=10 DATABASE_ECHO=false APP_HOST=0.0.0.0 APP_PORT=8000 LOG_LEVEL=info DEBUG=false OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_TEXT_MODEL=llama3.2 OLLAMA_VISION_MODEL=llava OLLAMA_TIMEOUT=120 OLLAMA_MOCK=false IMAGE_STORAGE_PATH=/var/lib/vmk/images ENABLE_IMAGE_ANALYSIS=true ENABLE_PRICE_ESTIMATION=true
docker-compose.ymlСоздать /home/gmikcon/Projects/vmk/data_collector/docker-compose.yml:
postgres: image postgres:16-alpine, env POSTGRES_USER/POSTGRES_PASSWORD/POSTGRES_DB, порт 5432, volume vmk_postgres_datapg_isreadyvmk_postgres_datapython -m venv .venv source .venv/bin/activate pip install -e ".[dev]"
src/vmk_data_collector/core/config.py)Создать Settings(BaseSettings):
model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8", extra="ignore")database_url: str, database_pool_size: int=20, database_max_overflow: int=10, database_echo: bool=Falseapp_host: str, app_port: int, log_level: str, debug: bool=Falseollama_base_url: str, ollama_text_model: str, ollama_vision_model: str, ollama_timeout: int=120, ollama_mock: bool=Falseimage_storage_path: str, enable_image_analysis: bool=True, enable_price_estimation: bool=True@property def database_url_async(self) -> str — возвращает URL с +asyncpg@property def image_storage_path_abs(self) -> Path — Path(self.image_storage_path).resolve()src/vmk_data_collector/core/logging.py)Создать функцию configure_logging(log_level: str) -> None:
filter_by_level, add_logger_name, add_log_level, format_exc_info, TimeStamper, JSONRendererdebug=True — ConsoleRenderer(colors=True)src/vmk_data_collector/core/exceptions.py)Создать иерархию:
class AppException(Exception): ...class ValidationError(AppException): ...class AIProcessingError(AppException): ...class NotRealEstateError(ValidationError): ... — специфичный rejectclass DatabaseError(AppException): ...pip install -e ".[dev]" проходит без ошибокpython -c "from vmk_data_collector.core.config import Settings; print(Settings())" читает .envdocker compose up -d запускает PostgreSQLruff check src/ проходит без ошибокalembic init alembic
alembic.ini: sqlalchemy.url = postgresql+asyncpg://postgres:postgres@localhost:5432/vmk_dataalembic/env.py:
Base из vmk_data_collector.db.basetarget_metadata = Base.metadatarun_migrations_online() с AsyncEngine из create_async_enginesrc/vmk_data_collector/db/base.py)Создать:
from sqlalchemy.orm import DeclarativeBase
class Base(DeclarativeBase):
pass
src/vmk_data_collector/domain/enums.py)Создать:
class RawDataStatus(str, Enum): pending, processing, completed, failed, invalidclass ValidationResult(str, Enum): valid, invalid, uncertainclass DealType(str, Enum): sale, rent_long, rent_shortclass ListingStatus(str, Enum): active, sold, rented, removed, archivedclass BuildingType(str, Enum): brick, panel, monolith, gas_block, woodclass RenovationStatus(str, Enum): cosmetic, euro, designer, none, constructionclass BathroomType(str, Enum): combined, separate, multipleclass ParkingType(str, Enum): ground, underground, none, garageclass HeatingType(str, Enum): central, autonomous, floor, noneclass LayoutType(str, Enum): studio, separate, adjacentclass WindowView(str, Enum): yard, street, park, water, forestclass MetroDistanceType(str, Enum): walk, transportclass ImageDownloadStatus(str, Enum): pending, downloaded, failedclass ImageAnalysisStatus(str, Enum): pending, completed, failedclass CustomFieldType(str, Enum): str, int, float, bool, date, jsonsrc/vmk_data_collector/models/data_source.pyclass DataSource(Base):
__tablename__ = "data_sources"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
slug: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
name: Mapped[str] = mapped_column(String(255), nullable=False)
url_pattern: Mapped[str | None] = mapped_column(String(512))
description: Mapped[str | None] = mapped_column(Text)
created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), server_default=func.now())
src/vmk_data_collector/models/property_type.pyclass PropertyType(Base):
__tablename__ = "property_types"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
slug: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
name: Mapped[str] = mapped_column(String(128), nullable=False)
description: Mapped[str | None] = mapped_column(Text)
src/vmk_data_collector/models/raw_parsing_data.pyclass RawParsingData(Base):
__tablename__ = "raw_parsing_data"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
source_id: Mapped[int | None] = mapped_column(ForeignKey("data_sources.id"))
external_id: Mapped[str | None] = mapped_column(String(255))
payload: Mapped[dict] = mapped_column(JSONB, default={})
status: Mapped[RawDataStatus] = mapped_column(Enum(RawDataStatus, name="raw_data_status"), default=RawDataStatus.pending)
validation_result: Mapped[ValidationResult | None] = mapped_column(Enum(ValidationResult, name="validation_result"))
error_message: Mapped[str | None] = mapped_column(Text)
received_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), server_default=func.now())
processed_at: Mapped[datetime | None] = mapped_column(TIMESTAMP(timezone=True))
__table_args__ = (UniqueConstraint("source_id", "external_id"),)
src/vmk_data_collector/models/property_listing.pyСоздать с всеми полями из SPECIFICATION.md:
__table_args__: UniqueConstraint("source_id", "external_id")src/vmk_data_collector/models/property_image.pyclass PropertyImage(Base):
__tablename__ = "property_images"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
property_id: Mapped[int] = mapped_column(ForeignKey("property_listings.id", ondelete="CASCADE"))
url: Mapped[str] = mapped_column(Text, nullable=False)
local_path: Mapped[str | None] = mapped_column(String(512))
hash: Mapped[str | None] = mapped_column(String(64))
file_size: Mapped[int | None] = mapped_column(Integer)
width: Mapped[int | None] = mapped_column(SmallInteger)
height: Mapped[int | None] = mapped_column(SmallInteger)
download_status: Mapped[ImageDownloadStatus] = mapped_column(Enum(ImageDownloadStatus, name="image_download_status"), default=ImageDownloadStatus.pending)
ai_description: Mapped[str | None] = mapped_column(Text)
analysis_status: Mapped[ImageAnalysisStatus] = mapped_column(Enum(ImageAnalysisStatus, name="image_analysis_status"), default=ImageAnalysisStatus.pending)
order_index: Mapped[int] = mapped_column(SmallInteger, default=0)
src/vmk_data_collector/models/property_custom_field.pyclass PropertyCustomField(Base):
__tablename__ = "property_custom_fields"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
property_id: Mapped[int] = mapped_column(ForeignKey("property_listings.id", ondelete="CASCADE"))
field_name: Mapped[str] = mapped_column(String(128), nullable=False)
field_value: Mapped[str] = mapped_column(Text)
field_type: Mapped[CustomFieldType] = mapped_column(Enum(CustomFieldType, name="custom_field_type"), default=CustomFieldType.str)
__table_args__ = (UniqueConstraint("property_id", "field_name"),)
src/vmk_data_collector/models/property_snapshot.pyclass PropertySnapshot(Base):
__tablename__ = "property_snapshots"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
property_id: Mapped[int] = mapped_column(ForeignKey("property_listings.id", ondelete="CASCADE"))
snapshot_data: Mapped[dict] = mapped_column(JSONB, default={})
changed_fields: Mapped[dict] = mapped_column(JSONB, default={})
created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), server_default=func.now())
src/vmk_data_collector/models/ai_enrichment.pyclass AiEnrichment(Base):
__tablename__ = "ai_enrichments"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
property_id: Mapped[int] = mapped_column(ForeignKey("property_listings.id", ondelete="CASCADE"), unique=True)
extracted_features: Mapped[dict] = mapped_column(JSONB, default={})
price_assessment: Mapped[dict] = mapped_column(JSONB, default={})
listing_quality_score: Mapped[int | None] = mapped_column(SmallInteger)
reliability_rating: Mapped[int | None] = mapped_column(SmallInteger)
sentiment_score: Mapped[float | None] = mapped_column(Numeric(3, 2))
classification: Mapped[str | None] = mapped_column(String(64))
image_analysis_results: Mapped[dict] = mapped_column(JSONB, default={})
generated_description: Mapped[str | None] = mapped_column(Text)
summary: Mapped[str | None] = mapped_column(Text)
model_version: Mapped[str | None] = mapped_column(String(64))
processing_time_ms: Mapped[int | None] = mapped_column(Integer)
created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), server_default=func.now())
src/vmk_data_collector/models/__init__.pyИмпортировать все модели для Base.metadata:
from .data_source import DataSource from .property_type import PropertyType from .raw_parsing_data import RawParsingData from .property_listing import PropertyListing from .property_image import PropertyImage from .property_custom_field import PropertyCustomField from .property_snapshot import PropertySnapshot from .ai_enrichment import AiEnrichment
alembic revision --autogenerate -m "initial" alembic upgrade head
Создать src/vmk_data_collector/db/seed.py:
async def seed_property_types(session: AsyncSession) -> Noneproperty_types (apartment, house, townhouse, commercial, land, garage, office, warehouse, retail, cottage, room, new_building)deal_types (sale, rent_long, rent_short)alembic upgrade head проходит без ошибок\dt в psql показывает все таблицыseed_property_types() заполняет справочникиsrc/vmk_data_collector/db/engine.py)from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker, AsyncSession
from vmk_data_collector.core.config import settings
engine = create_async_engine(
settings.database_url_async,
pool_size=settings.database_pool_size,
max_overflow=settings.database_max_overflow,
echo=settings.database_echo,
)
src/vmk_data_collector/db/session.py)from .engine import engine
AsyncSessionLocal = async_sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
async def get_db_session() -> AsyncSession:
async with AsyncSessionLocal() as session:
yield session
src/vmk_data_collector/db/repositories/base.py)class BaseRepository:
def __init__(self, session: AsyncSession) -> None:
self._session = session
async def add(self, obj: T) -> T:
self._session.add(obj)
await self._session.flush()
return obj
async def delete(self, obj: T) -> None:
await self._session.delete(obj)
async def commit(self) -> None:
await self._session.commit()
src/vmk_data_collector/db/repositories/raw_data.py)Методы:
async def create(self, source_id: int | None, external_id: str | None, payload: dict) -> RawParsingDataasync def get_by_id(self, raw_data_id: int) -> RawParsingData | Noneasync def get_by_source_and_external(self, source_id: int, external_id: str) -> RawParsingData | Noneasync def update_status(self, raw_data_id: int, status: RawDataStatus, error_message: str | None = None) -> Noneasync def set_processed(self, raw_data_id: int) -> Nonesrc/vmk_data_collector/db/repositories/property.py)Методы:
async def create(self, **kwargs) -> PropertyListingasync def get_by_id(self, property_id: int) -> PropertyListing | Noneasync def get_by_source_and_external(self, source_id: int, external_id: str) -> PropertyListing | Noneasync def update(self, property_id: int, **kwargs) -> PropertyListingasync def delete_custom_fields(self, property_id: int) -> Nonesrc/vmk_data_collector/db/repositories/image.py)Методы:
async def create(self, property_id: int, url: str, order_index: int = 0) -> PropertyImageasync def get_by_property(self, property_id: int) -> list[PropertyImage]async def get_by_hash(self, property_id: int, image_hash: str) -> PropertyImage | Noneasync def update_downloaded(self, image_id: int, local_path: str, file_size: int, width: int, height: int, image_hash: str) -> Noneasync def update_analysis(self, image_id: int, ai_description: str) -> Nonesrc/vmk_data_collector/db/repositories/custom_field.py)Методы:
async def create(self, property_id: int, field_name: str, field_value: str, field_type: CustomFieldType = CustomFieldType.str) -> PropertyCustomFieldasync def bulk_create(self, property_id: int, fields: list[dict]) -> Noneasync def delete_by_property(self, property_id: int) -> Nonesrc/vmk_data_collector/db/repositories/snapshot.py)Методы:
async def create(self, property_id: int, snapshot_data: dict, changed_fields: dict) -> PropertySnapshotsrc/vmk_data_collector/db/repositories/ai_enrichment.py)Методы:
async def create(self, property_id: int, **kwargs) -> AiEnrichmentasync def delete_by_property(self, property_id: int) -> Nonesrc/vmk_data_collector/db/repositories/data_source.py)Методы:
async def get_or_create_by_slug(self, slug: str, name: str | None = None) -> DataSourceget_db_session() создаёт валидную async сессиюsrc/vmk_data_collector/domain/entities.py)@dataclass
class NormalizedProperty:
property_type: str
deal_type: str
title: str | None
description: str | None
price: float | None
currency: str | None
total_area: float | None
... (все поля из property_listings)
custom_fields: dict[str, Any]
images: list[str]
@dataclass
class AiImageAnalysis:
overall_condition: str | None
rooms_observed: int | None
issues_found: list[str]
positive_highlights: list[str]
view_from_window: str | None
furniture_included: bool | None
appliances_included: list[str]
@dataclass
class AiEnrichmentResult:
extracted_features: dict[str, Any]
price_assessment: dict[str, Any]
listing_quality_score: int | None
reliability_rating: int | None
sentiment_score: float | None
classification: str | None
image_analysis_results: dict[str, Any]
generated_description: str | None
summary: str | None
model_version: str | None
processing_time_ms: int | None
src/vmk_data_collector/schemas/raw_data.py)class RawDataIngestRequest(BaseModel):
source_slug: strexternal_id: strpayload: dict[str, Any]class IngestResponse(BaseModel):
job_id: intproperty_id: int | Nonestatus: strreason: str | None = Nonemessage: strsnapshot_id: int | None = Nonesrc/vmk_data_collector/schemas/ai_response.py)class AiNormalizerResponse(BaseModel):
is_real_estate: boolreason: str | Nonenormalized: NormalizedPropertySchema | Noneclass AiImageAnalysisResponse(BaseModel):
overall_condition: str | Nonerooms_observed: int | Noneissues_found: list[str]positive_highlights: list[str]view_from_window: str | Nonefurniture_included: bool | Noneappliances_included: list[str]class AiEnrichmentResponse(BaseModel):
extracted_features: dict[str, Any]price_assessment: dict[str, Any]listing_quality_score: int | Nonereliability_rating: int | Nonesentiment_score: float | Noneclassification: str | Noneimage_analysis_results: dict[str, Any]generated_description: str | Nonesummary: str | Nonemodel_version: str | Noneprocessing_time_ms: int | Nonesrc/vmk_data_collector/schemas/normalized.py)Pydantic-модель с всеми полями property_listings + custom_fields: dict[str, Any].
RawDataIngestRequest работает с минимальным и полным payloadAiNormalizerResponse корректно парсит JSON с is_real_estate: falseNormalizedPropertySchema маппится из словаря без ошибокsrc/vmk_data_collector/services/ollama_client.py)class OllamaClient:
def __init__(self, base_url: str, timeout: int) -> None:
self._client = httpx.AsyncClient(base_url=base_url, timeout=timeout)
async def chat(self, model: str, messages: list[dict], json_mode: bool = False) -> dict:
payload = {"model": model, "messages": messages, "stream": False}
if json_mode:
payload["format"] = "json"
response = await self._client.post("/api/chat", json=payload)
response.raise_for_status()
return response.json()
async def chat_with_images(self, model: str, messages: list[dict], images_base64: list[str]) -> dict:
# messages[0]["images"] = images_base64
...
async def close(self) -> None:
await self._client.aclose()
src/vmk_data_collector/services/ai_normalizer.py)class AiNormalizer:
SYSTEM_PROMPT = """Ты — анализатор объявлений о недвижимости.
Определи, является ли текст объявлением о недвижимости.
Если нет — верни {"is_real_estate": false, "reason": "..."}.
Если да — верни {"is_real_estate": true, "normalized": {...}}.
Ответь ТОЛЬКО JSON."""
async def normalize(self, payload: dict) -> AiNormalizerResponse:
# 1. Формируем текст из payload (title + description + params)
# 2. Вызываем ollama_client.chat(json_mode=True)
# 3. Парсим ответ через AiNormalizerResponse
# 4. Если ollama_mock — возвращаем фиксированный мок
src/vmk_data_collector/services/ai_image_analyzer.py)class AiImageAnalyzer:
SYSTEM_PROMPT = """Опиши состояние объекта недвижимости на фото.
Ответь ТОЛЬКО JSON с полями: overall_condition, rooms_observed, issues_found, positive_highlights, view_from_window, furniture_included, appliances_included."""
async def analyze(self, image_base64: str) -> AiImageAnalysisResponse:
# 1. Вызываем ollama_client.chat_with_images()
# 2. Парсим AiImageAnalysisResponse
src/vmk_data_collector/services/ai_enricher.py)class AiEnricher:
SYSTEM_PROMPT = """Проанализируй объявление о недвижимости.
Верни ТОЛЬКО JSON с полями:
extracted_features, price_assessment, listing_quality_score (1-10),
reliability_rating (1-5), sentiment_score (-1..1), classification,
generated_description, summary, language."""
async def enrich(self, normalized: NormalizedProperty, image_analysis_results: dict) -> AiEnrichmentResult:
# 1. Формируем prompt из текста + image_analysis
# 2. Вызываем ollama_client.chat(json_mode=True)
# 3. Парсим AiEnrichmentResponse
# 4. Маппим в AiEnrichmentResult
src/vmk_data_collector/services/image_downloader.py)class ImageDownloader:
def __init__(self, storage_path: Path) -> None:
self._storage_path = storage_path
async def download(self, property_id: int, image_url: str, order_index: int) -> PropertyImageDownloadResult:
# 1. httpx async GET image_url
# 2. SHA-256 content
# 3. ext из Content-Type или URL
# 4. local_path = storage_path / str(property_id) / f"{hash}.{ext}"
# 5. Сохранить на диск
# 6. Pillow: width, height
# 7. file_size = len(content)
# 8. Вернуть dataclass с local_path, hash, width, height, file_size
OllamaClient.chat() мокается в тестах (respx/httpx_mock)AiNormalizer корректно reject'ит авто-объявлениеAiNormalizer корректно accept'ит квартиру из каши текстаImageDownloader скачивает картинку, считает hash, извлекает размерыAiImageAnalyzer возвращает структурированный JSONAiEnricher возвращает enrichment с quality_score и reliability_ratingsrc/vmk_data_collector/services/property_pipeline.py)class PropertyPipeline:
def __init__(
self,
raw_repo: RawDataRepository,
property_repo: PropertyRepository,
image_repo: ImageRepository,
custom_field_repo: CustomFieldRepository,
snapshot_repo: SnapshotRepository,
enrichment_repo: AiEnrichmentRepository,
data_source_repo: DataSourceRepository,
normalizer: AiNormalizer,
image_downloader: ImageDownloader,
image_analyzer: AiImageAnalyzer,
enricher: AiEnricher,
) -> None:
...
async def process(self, raw_data_id: int) -> IngestResponse:
# 1. raw_repo.get_by_id(raw_data_id)
# 2. raw_repo.update_status → processing
# 3. normalizer.normalize(raw.payload)
# 4. Если not is_real_estate:
# raw_repo.update_status → invalid
# return IngestResponse(status="invalid", reason=...)
# 5. data_source = data_source_repo.get_or_create_by_slug(source_slug)
# 6. existing = property_repo.get_by_source_and_external(...)
# 7. Если existing:
# snapshot_repo.create(existing)
# property_repo.update(existing.id, **normalized_data)
# custom_field_repo.delete_by_property(existing.id)
# property_id = existing.id
# Иначе:
# property = property_repo.create(**normalized_data)
# property_id = property.id
# 8. custom_field_repo.bulk_create(property_id, normalized.custom_fields)
# 9. images: for url in normalized.images:
# image = image_repo.create(property_id, url, order)
# result = await image_downloader.download(property_id, url, order)
# image_repo.update_downloaded(image.id, result.local_path, ...)
# analysis = await image_analyzer.analyze(base64)
# image_repo.update_analysis(image.id, analysis.overall_condition)
# 10. enrichment = await enricher.enrich(normalized, aggregated_image_analysis)
# 11. enrichment_repo.delete_by_property(property_id)
# 12. enrichment_repo.create(property_id, **enrichment)
# 13. raw_repo.set_processed(raw_data_id)
# 14. return IngestResponse(status="completed", property_id=...)
src/vmk_data_collector/main.py)@asynccontextmanager
async def lifespan(app: FastAPI):
# startup: configure_logging, create image storage dir
yield
# shutdown: close ollama client
app = FastAPI(title="VMK Data Collector", version="0.1.0", lifespan=lifespan)
app.include_router(properties_router, prefix="/api/v1")
src/vmk_data_collector/api/deps.py)async def get_db() -> AsyncSession:
async with AsyncSessionLocal() as session:
yield session
async def get_property_pipeline(db: AsyncSession = Depends(get_db)) -> PropertyPipeline:
# Собрать все зависимости вручную (или использовать DI-контейнер)
...
src/vmk_data_collector/api/v1/router_properties.py)router = APIRouter(prefix="/properties")
@router.post("/ingest", response_model=IngestResponse, status_code=202)
async def ingest_property(
request: RawDataIngestRequest,
pipeline: PropertyPipeline = Depends(get_property_pipeline),
) -> IngestResponse:
# 1. Создать raw_parsing_data через raw_repo
# 2. Запустить pipeline.process(raw_data_id)
# 3. Вернуть IngestResponse
src/vmk_data_collector/core/exceptions.py + в main.py)AppException → 500 с деталямиValidationError → 422NotRealEstateError → 202 (но со статусом invalid)AIProcessingError → 202 (но со статусом failed)POST /api/v1/properties/ingest возвращает 202Фикстуры:
async_engine — create_async_engine("postgresql+asyncpg://postgres:postgres@localhost:5432/vmk_data_test")init_db(async_engine) — Base.metadata.create_alldb_session(async_engine) — async session с begin() + rollbackclient(db_session) — FastAPI TestClient с переопределённым get_dbmock_ollama_client() — фиксированные ответыsample_payload_minimal() — только title, url, imagessample_payload_full() — все поляsample_payload_car() — объявление про машину (для reject)tests/unit/)test_ai_normalizer.py:
test_reject_car_listing() — is_real_estate=Falsetest_accept_apartment_from_messy_text() — извлекает rooms, area, floor из кашиtest_mock_mode_returns_fixed_json() — при OLLAMA_MOCK=truetest_ai_image_analyzer.py:
test_analyze_returns_structured_response()test_analyze_with_empty_image()test_ai_enricher.py:
test_enrich_returns_quality_score_and_reliability()test_enrich_with_image_analysis()test_image_downloader.py:
test_download_creates_file_and_returns_hash() (mock httpx + tempfile)test_dedup_returns_existing_hash()test_property_pipeline.py:
test_new_listing_creates_all_records() (моки всех репозиториев)test_existing_listing_creates_snapshot() (моки)test_invalid_payload_returns_invalid_status()tests/integration/)test_api_ingest.py:
test_ingest_minimal_payload_202() — проверяет создание raw + listingtest_ingest_full_payload_populates_all_fields()test_ingest_duplicate_external_id_updates_and_creates_snapshot()test_ingest_car_payload_returns_invalid() — проверяет, что listing не созданtest_db_pipeline.py:
test_upsert_creates_snapshot_with_correct_data()test_custom_fields_deleted_on_update()test_images_deduplicated_by_hash()pytest tests/unit -v -m unit pytest tests/integration -v -m integration pytest --cov=src --cov-report=term-missing
pytest проходит в чистом окружении.PHONY: dev test lint migrate
dev:
docker compose up -d
uvicorn vmk_data_collector.main:app --reload
test:
pytest -v
lint:
ruff check src tests
black --check src tests
mypy src
migrate:
alembic upgrade head
seed:
python -c "import asyncio; from vmk_data_collector.db.seed import seed_all; asyncio.run(seed_all())"
FROM python:3.12-slim WORKDIR /app COPY pyproject.toml . RUN pip install -e ".[dev]" COPY . . CMD ["uvicorn", "vmk_data_collector.main:app", "--host", "0.0.0.0", "--port", "8000"]
make dev запускает сервисmake test проходитmake lint проходит| # | Путь | Фаза | Описание |
|---|---|---|---|
| 1 | pyproject.toml |
1 | Зависимости, конфиги инструментов |
| 2 | .env.example |
1 | Шаблон env |
| 3 | docker-compose.yml |
1 | PostgreSQL |
| 4 | .gitignore |
0 | Исключения git |
| 5 | src/vmk_data_collector/core/config.py |
1 | Pydantic Settings |
| 6 | src/vmk_data_collector/core/logging.py |
1 | structlog |
| 7 | src/vmk_data_collector/core/exceptions.py |
1 | Исключения |
| 8 | src/vmk_data_collector/db/base.py |
2 | DeclarativeBase |
| 9 | src/vmk_data_collector/domain/enums.py |
2 | Все ENUM |
| 10 | src/vmk_data_collector/models/data_source.py |
2 | ORM DataSource |
| 11 | src/vmk_data_collector/models/property_type.py |
2 | ORM PropertyType |
| 12 | src/vmk_data_collector/models/raw_parsing_data.py |
2 | ORM RawParsingData |
| 13 | src/vmk_data_collector/models/property_listing.py |
2 | ORM PropertyListing |
| 14 | src/vmk_data_collector/models/property_image.py |
2 | ORM PropertyImage |
| 15 | src/vmk_data_collector/models/property_custom_field.py |
2 | ORM PropertyCustomField |
| 16 | src/vmk_data_collector/models/property_snapshot.py |
2 | ORM PropertySnapshot |
| 17 | src/vmk_data_collector/models/ai_enrichment.py |
2 | ORM AiEnrichment |
| 18 | src/vmk_data_collector/models/__init__.py |
2 | Импорты |
| 19 | alembic/env.py |
2 | Alembic async env |
| 20 | src/vmk_data_collector/db/seed.py |
2 | Seed справочников |
| 21 | src/vmk_data_collector/db/engine.py |
3 | AsyncEngine |
| 22 | src/vmk_data_collector/db/session.py |
3 | AsyncSessionLocal |
| 23 | src/vmk_data_collector/db/repositories/base.py |
3 | BaseRepository |
| 24 | src/vmk_data_collector/db/repositories/raw_data.py |
3 | RawDataRepository |
| 25 | src/vmk_data_collector/db/repositories/property.py |
3 | PropertyRepository |
| 26 | src/vmk_data_collector/db/repositories/image.py |
3 | ImageRepository |
| 27 | src/vmk_data_collector/db/repositories/custom_field.py |
3 | CustomFieldRepository |
| 28 | src/vmk_data_collector/db/repositories/snapshot.py |
3 | SnapshotRepository |
| 29 | src/vmk_data_collector/db/repositories/ai_enrichment.py |
3 | AiEnrichmentRepository |
| 30 | src/vmk_data_collector/db/repositories/data_source.py |
3 | DataSourceRepository |
| 31 | src/vmk_data_collector/domain/entities.py |
4 | Dataclass сущности |
| 32 | src/vmk_data_collector/schemas/raw_data.py |
4 | API схемы |
| 33 | src/vmk_data_collector/schemas/ai_response.py |
4 | AI схемы |
| 34 | src/vmk_data_collector/schemas/normalized.py |
4 | NormalizedPropertySchema |
| 35 | src/vmk_data_collector/services/ollama_client.py |
5 | HTTP клиент Ollama |
| 36 | src/vmk_data_collector/services/ai_normalizer.py |
5 | AiNormalizer |
| 37 | src/vmk_data_collector/services/ai_image_analyzer.py |
5 | AiImageAnalyzer |
| 38 | src/vmk_data_collector/services/ai_enricher.py |
5 | AiEnricher |
| 39 | src/vmk_data_collector/services/image_downloader.py |
5 | ImageDownloader |
| 40 | src/vmk_data_collector/services/property_pipeline.py |
6 | PropertyPipeline |
| 41 | src/vmk_data_collector/api/deps.py |
6 | DI-зависимости |
| 42 | src/vmk_data_collector/api/v1/router_properties.py |
6 | FastAPI router |
| 43 | src/vmk_data_collector/main.py |
6 | Точка входа |
| 44 | tests/conftest.py |
7 | Фикстуры |
| 45 | tests/unit/test_ai_normalizer.py |
7 | Unit тесты нормализатора |
| 46 | tests/unit/test_ai_image_analyzer.py |
7 | Unit тесты анализатора |
| 47 | tests/unit/test_ai_enricher.py |
7 | Unit тесты enricher |
| 48 | tests/unit/test_image_downloader.py |
7 | Unit тесты downloader |
| 49 | tests/unit/test_property_pipeline.py |
7 | Unit тесты pipeline |
| 50 | tests/integration/test_api_ingest.py |
7 | Integration API |
| 51 | tests/integration/test_db_pipeline.py |
7 | Integration DB |
| 52 | docs/SPECIFICATION.md |
8 | ТЗ |
| 53 | docs/ARCHITECTURE.md |
8 | Архитектура |
| 54 | README.md |
8 | README |
| 55 | Makefile |
8 | Команды |
| 56 | Dockerfile |
8 | Docker образ |
Фаза 0 (env)
│
▼
Фаза 1 (config) ─────────────────────────┐
│ │
▼ │
Фаза 2 (models) ──► Фаза 3 (repos) ──► Фаза 4 (schemas)
│ │ │
▼ │ ▼
Alembic migrate ◄────────────────────────┘ Фаза 5 (AI services)
│
▼
Фаза 6 (Pipeline + API)
│
▼
Фаза 7 (Tests)
│
▼
Фаза 8 (Docs + Deploy)
Можно параллельно: Фаза 4 (schemas) и Фаза 5 (AI services) не зависят друг от друга. Оба зависят от Фазы 1.
.env скопирован из .env.example и настроенpip install -e ".[dev]" прошёл успешноalembic инициализирован, alembic.ini настроен на async