Architecture overview
Purpose
This document describes the high-level architecture of the intelligent machine-vision system for real-time defect detection on polyurethane shoe soles.
Full project context is in project_context.md.
System diagram summary
User
│
▼
WEB UI ◄──────► Main Server ◄──────► DB
│
┌───────────┼───────────┐
▼ ▼ ▼
YOLO Inst 1 YOLO Inst 2 ...
▲ ▲
│ │
IP Camera 1 IP Camera 2 ...
Main components
1. IP Cameras / Raspberry Pi
- One camera per inspection channel / tab.
- Captures frames in Full HD.
- May be real IP cameras or Raspberry Pi with camera modules.
2. Video stream → frame converter
- Receives continuous video stream from each camera.
- Extracts single frames for analysis.
- Triggered either by:
- a position sensor / conveyor signal, or
- an AI-based position determination module.
3. Position determination
- Detects when a sole is in the correct position for capture.
- Can use a physical sensor or an AI model.
- Sends trigger to the frame extractor.
4. Image preparation
Pre-processing steps applied before inference:
- normalizing;
- filtering;
- cropping;
- resizing;
- rotating (to simulate and correct small deviations).
5. YOLO instances
- One YOLO detector instance per camera / channel.
- Receives prepared frame from Image Preparing module.
- Outputs detected defect candidates.
6. AI Model
- Central model management component.
- YOLO instances load their model weights through this layer.
- Supports retraining from collected expert-verified data.
7. Main Server
- Orchestrates the pipeline:
- receives frame or prepared image;
- routes to the correct YOLO instance;
- collects inference results;
- stores events in DB;
- serves the WEB UI.
- Reads JSON configuration for channels, model paths, thresholds, etc.
8. WEB UI
Provides:
- inspection history view;
- validation status display;
- expert feedback: correct / incorrect result;
- multi-channel tabs (camera 1, 2, 3) with independent YOLO instances and settings;
- live camera preview for setup;
- settings panel;
- retraining trigger with date-range filter.
9. Database
Stores:
- inspection events;
- raw and annotated images;
- expert labels;
- configuration.
- Generates synthetic / augmented inputs for development and testing.
- Allows offline debugging when real cameras are unavailable.
- Sources:
- initial real dataset;
- artificially generated images (noise, lighting, dirt, rotation).
Data flow
- Position determination triggers capture.
- Video stream → frame converter extracts a frame.
- Image preparing module normalizes / filters / crops / resizes / rotates the frame.
- Main server routes the prepared image to the right YOLO instance.
- YOLO instance runs inference using the AI Model.
- Results return to Main Server.
- Main Server writes an event to DB.
- WEB UI reads events and images from DB / server.
- Expert reviews results in WEB UI and marks correct / incorrect.
- Verified data flows back to the Learning Dataset for retraining.