Architecture overview

Purpose

This document describes the high-level architecture of the intelligent machine-vision system for real-time defect detection on polyurethane shoe soles.

Full project context is in project_context.md.

System diagram summary

User
 │
 ▼
WEB UI ◄──────► Main Server ◄──────► DB
                    │
        ┌───────────┼───────────┐
        ▼           ▼           ▼
   YOLO Inst 1  YOLO Inst 2  ...
        ▲           ▲
        │           │
   IP Camera 1  IP Camera 2  ...

Main components

1. IP Cameras / Raspberry Pi

One camera per inspection channel / tab.
Captures frames in Full HD.
May be real IP cameras or Raspberry Pi with camera modules.

2. Video stream → frame converter

Receives continuous video stream from each camera.
Extracts single frames for analysis.
Triggered either by:
- a position sensor / conveyor signal, or
- an AI-based position determination module.

3. Position determination

Detects when a sole is in the correct position for capture.
Can use a physical sensor or an AI model.
Sends trigger to the frame extractor.

4. Image preparation

Pre-processing steps applied before inference:

normalizing;
filtering;
cropping;
resizing;
rotating (to simulate and correct small deviations).

5. YOLO instances

One YOLO detector instance per camera / channel.
Receives prepared frame from Image Preparing module.
Outputs detected defect candidates.

6. AI Model

Central model management component.
YOLO instances load their model weights through this layer.
Supports retraining from collected expert-verified data.

7. Main Server

Orchestrates the pipeline:
- receives frame or prepared image;
- routes to the correct YOLO instance;
- collects inference results;
- stores events in DB;
- serves the WEB UI.
Reads JSON configuration for channels, model paths, thresholds, etc.

8. WEB UI

Provides:

inspection history view;
validation status display;
expert feedback: correct / incorrect result;
multi-channel tabs (camera 1, 2, 3) with independent YOLO instances and settings;
live camera preview for setup;
settings panel;
retraining trigger with date-range filter.

9. Database

Stores:

inspection events;
raw and annotated images;
expert labels;
configuration.

10. Fake input data / factory environment emulation

Generates synthetic / augmented inputs for development and testing.
Allows offline debugging when real cameras are unavailable.
Sources:
- initial real dataset;
- artificially generated images (noise, lighting, dirt, rotation).

Data flow

Position determination triggers capture.
Video stream → frame converter extracts a frame.
Image preparing module normalizes / filters / crops / resizes / rotates the frame.
Main server routes the prepared image to the right YOLO instance.
YOLO instance runs inference using the AI Model.
Results return to Main Server.
Main Server writes an event to DB.
WEB UI reads events and images from DB / server.
Expert reviews results in WEB UI and marks correct / incorrect.
Verified data flows back to the Learning Dataset for retraining.