diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..71411cd --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,48 @@ +# Agent Notes + +> Concise operating instructions for autonomous work in this repo. Keep this file small so it fits into small-LLM context windows and remains useful for agents with limited token budgets. + +## Project phase + +Bootstrap. Intelligent machine-vision system for real-time defect detection on polyurethane shoe soles. Full context is in `docs/project_context.md`. + +Before adding substantial code, pick and document the tech stack in `README.md` and create the matching manifest (e.g. `pyproject.toml`, `package.json`, `Cargo.toml`, `go.mod`). + +## Autonomous work rules + +- Do not mutate git history (`git reset`, `rebase`, `push --force`) unless explicitly asked. +- Do not create pull requests, push commits, or change git remotes unless explicitly asked. +- Keep changes minimal and focused on the task at hand. +- Prefer editing existing files over creating new ones when it satisfies the requirement. +- Ask the user before deleting files, directories, or significant blocks of code. +- If a task is unclear or ambiguous, pause and ask for clarification instead of guessing. + +## Context management for small LLMs + +- Read `docs/project_context.md` at the start of each autonomous task. +- Before editing, read only the files directly related to the change. Avoid dumping large unrelated files into context. +- When touching multiple files, process them one logical step at a time and run verification after each step. +- Summarize long outputs before storing or returning them. Do not paste large logs verbatim unless specifically requested. +- If a command output is large, prefer to save it to a file and return a summary with the file path. + +## Adding code + +- Follow the style and conventions of the existing codebase once code exists. +- Add or update tests for new logic and run them before finishing. +- Do not commit secrets, credentials, large binaries, or generated build artifacts. +- Use dependency manifests and lockfiles; avoid global installs without user confirmation. +- If the project is Python, keep a virtual environment inside the workspace and do not install into the system Python. +- Prefer small, focused functions and modules. Avoid huge files that do not fit into small context windows. + +## Documentation + +- Update `README.md` when the stack, setup steps, or major architecture changes. +- Keep project documentation in `docs/`. Add design docs, API specs, runbooks, and ADRs there, not in `README.md`. +- Update `AGENTS.md` when repo-specific commands, conventions, testing quirks, or environment setup change. +- Write docstrings / comments only for non-obvious behavior; do not add noise. + +## Verification habit + +- Before declaring a task done, run the relevant check: tests, lint, typecheck, or build. +- If a check does not exist yet, state that it should be added rather than skipping verification. +- Record verification commands and results in the response so the user can reproduce them quickly. diff --git a/README.md b/README.md index a48e66a..25bd710 100644 --- a/README.md +++ b/README.md @@ -1 +1,19 @@ -# sups_yolo \ No newline at end of file +# sups_yolo + +Intelligent information-measuring system for real-time control of geometric and physico-mechanical parameters of polyurethane shoe soles. + +## Goal + +Detect and classify defects on polyurethane soles in several categories, despite moderate disturbances such as dust, glare, and varying lighting conditions. + +## System overview + +- **Vision hardware**: 2–3 Raspberry Pi or IP cameras with web access + a workstation with an RTX 2060; Full HD cameras. +- **Software**: Linux-like OS, logging of processed data, YOLO-based detection instances per camera. +- **User web interface**: history view, validation status, expert feedback (correct/incorrect), multi-channel tabs (camera 1/2/3 with independent YOLO instances), live camera preview for setup, settings section, retraining with date-restricted data. +- **Event record per sole**: sole ID, defect photo, defect probability, annotated photo with defect zone. +- **Performance target**: 15 seconds per image analysis and result description. + +## Documentation + +Project documentation lives in [`docs/`](docs/). diff --git a/docs/AI_AGENT_GUIDE.md b/docs/AI_AGENT_GUIDE.md new file mode 100644 index 0000000..b867d05 --- /dev/null +++ b/docs/AI_AGENT_GUIDE.md @@ -0,0 +1,39 @@ +# AI agent guide + +> How to work with this repository's architecture documentation. + +## Start here + +Before any autonomous task, read in this order: + +1. [`project_context.md`](project_context.md) — what the system does and why. +2. [`architecture_overview.md`](architecture_overview.md) — high-level components and data flow. +3. [`architecture_components.md`](architecture_components.md) — responsibilities of each module. +4. [`architecture_data_flow.md`](architecture_data_flow.md) — step-by-step flows for inspection, retraining, and debug. +5. [`architecture_testing.md`](architecture_testing.md) — testing constraints and datasets. + +## When implementing a component + +1. Find the component name in [`architecture_components.md`](architecture_components.md). +2. Check [`architecture_data_flow.md`](architecture_data_flow.md) for inputs and outputs. +3. Check [`architecture_testing.md`](architecture_testing.md) for related test data / constraints. +4. Add or update tests before finishing. +5. Run verification if available; if not, state what should be added. + +## When changing data schema + +- Update the data-flow descriptions in [`architecture_data_flow.md`](architecture_data_flow.md). +- Update DB and event definitions if they exist. +- Ensure the WEB UI and Main Server agree on the new fields. + +## When adding a new camera channel + +- Each channel maps to one YOLO instance with its own config (model, threshold, preprocessing). +- Update JSON config, Main Server routing, and WEB UI tabs together. +- See [`architecture_components.md`](architecture_components.md) for per-channel pipeline details. + +## When modifying models or training + +- Keep model versions and metrics. +- Document camera / dataset changes in [`architecture_testing.md`](architecture_testing.md). +- Verify retraining flow end-to-end before declaring done. diff --git a/docs/ARC.drawio.xml b/docs/ARC.drawio.xml new file mode 100644 index 0000000..1f2c71f --- /dev/null +++ b/docs/ARC.drawio.xml @@ -0,0 +1,255 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/architecture_components.md b/docs/architecture_components.md new file mode 100644 index 0000000..8baacf9 --- /dev/null +++ b/docs/architecture_components.md @@ -0,0 +1,138 @@ +# Architecture components + +This file lists each component from the architecture diagram in detail. Use it as a reference when implementing or modifying a module. + +## Per-camera pipeline + +### IP Camera / Raspberry Pi + +- Role: image acquisition. +- Expected count: 2–3 channels. +- Output: continuous video stream. +- Notes: + - Cameras may be replaced; recognition quality impact must be measurable. + - Positioning tolerance: rotation ±5°, displacement up to 10% of frame. + +### Position determination + +- Role: decide when to capture a frame. +- Options: + - Physical sensor on conveyor (simpler, deterministic); + - AI-based detector (more flexible, needs training). +- Output: trigger signal to frame extractor. + +### Video stream → frame + +- Role: extract still frames from the camera stream on demand. +- Input: trigger signal. +- Output: raw image frame. + +### Image preparing + +- Role: make the image suitable for the YOLO model. +- Operations: + - normalize pixel values / brightness / contrast; + - filter noise; + - crop to region of interest; + - resize to model input size; + - rotate to compensate for small alignment errors. +- Must be configurable per camera channel via JSON config. + +## Inference layer + +### YOLO instance + +- Role: run defect detection on a prepared image. +- One instance per camera / tab. +- Each instance has: + - own model weights; + - own confidence threshold; + - own class mapping; + - own preprocessing parameters. + +### AI Model + +- Role: manage model artifacts and retraining lifecycle. +- Responsibilities: + - load weights for YOLO instances; + - export updated weights after retraining; + - keep versioned model history; + - serve as an abstraction between training pipeline and inference. + +## Central services + +### Main Server + +- Role: coordination and API. +- Responsibilities: + - read JSON config at startup; + - manage camera channels; + - dispatch prepared images to correct YOLO instance; + - collect detection results; + - persist events to DB; + - expose HTTP/WebSocket API for WEB UI; + - handle retraining requests. + +### Database (DB) + +- Role: persistent storage. +- Stores: + - inspection events (sole ID, timestamp, channel, result); + - original images; + - annotated images with bounding boxes; + - expert verification labels; + - model versions; + - configuration snapshots. + +## User interface + +### WEB UI + +- Role: human operator and expert interface. +- Views: + - history list with filtering; + - detail view: original image, annotated image, probability, status; + - expert feedback buttons: correct / incorrect; + - multi-channel tabs (1, 2, 3) with per-channel YOLO settings; + - live camera preview for mechanical setup; + - settings form; + - retraining panel with date-range restriction. + +## Development / testing components + +### Fake input data (factory environment emulation) + +- Role: offline input generator. +- Used when cameras are not connected. +- Produces synthetic frames or replays recorded frames. + +### Artificial image generator + +- Role: augment datasets to simulate factory disturbances. +- Applied to: + - initial dataset; + - learning dataset. +- Transformations: + - lighting effects; + - PNG pattern overlays (dust, dirt, lens contamination); + - rotation (±5°); + - other noise. + +### Learning Dataset + +- Role: curated data used to train / retrain the model. +- Sources: + - initial dataset; + - artificial generator output; + - verified production data from expert feedback. + +### Learning / Training module + +- Role: run model training and retraining. +- Inputs: + - Learning Dataset; + - configuration (hyperparameters, date range). +- Outputs: + - new model weights; + - training metrics; + - updated AI Model entry. diff --git a/docs/architecture_data_flow.md b/docs/architecture_data_flow.md new file mode 100644 index 0000000..f367c3b --- /dev/null +++ b/docs/architecture_data_flow.md @@ -0,0 +1,53 @@ +# Architecture data flow + +This file describes the step-by-step data flow for a single inspection cycle and for retraining. + +## Normal inspection cycle + +1. **Sole enters camera view.** +2. **Position determination** fires a trigger (sensor or AI). +3. **Video stream → frame** captures one still frame from the active camera. +4. **Image preparing** normalizes, filters, crops, resizes, and rotates the frame according to the channel config. +5. **Main Server** receives the prepared image. +6. **Main Server** routes the image to the matching **YOLO instance** for that channel. +7. **YOLO instance** loads weights from **AI Model** and runs inference. +8. **YOLO instance** returns: + - detected defect classes; + - bounding boxes; + - confidence scores. +9. **Main Server** builds an event record: + - sole ID; + - channel ID; + - timestamp; + - original image reference; + - annotated image reference; + - defect probability / class. +10. **Main Server** writes the event to **DB**. +11. **WEB UI** fetches and displays the event. +12. **Expert / operator** reviews the result and marks it **correct** or **incorrect**. +13. **WEB UI** sends the label back to **Main Server**, which updates the event in **DB**. + +## Retraining flow + +1. **Operator** selects a date range in the WEB UI retraining panel. +2. **WEB UI** requests retraining from **Main Server**. +3. **Main Server** queries **DB** for verified events in the selected range. +4. **Main Server** assembles / augments images into the **Learning Dataset**. +5. Optionally, **Artificial image generator** adds synthetic disturbances. +6. **Learning module** trains / fine-tunes a model on the dataset. +7. **Learning module** stores new weights in **AI Model**. +8. **YOLO instances** can be reloaded with the new weights (hot-reload or restart). +9. **WEB UI** shows retraining status and metrics. + +## Development / debug flow + +1. **Fake input data** produces frames offline. +2. **OR gate** selects between real camera pipeline and fake input. +3. Selected frames go through **Image preparing** and then the same inference path. +4. Results are stored in DB and shown in WEB UI, exactly like production. + +## File / data references + +- Event record keeps references, not embedded binary images (unless required). +- Images should be stored on disk with stable paths; DB stores the paths. +- Annotated images are generated after inference and saved alongside originals. diff --git a/docs/architecture_overview.md b/docs/architecture_overview.md new file mode 100644 index 0000000..33e05c8 --- /dev/null +++ b/docs/architecture_overview.md @@ -0,0 +1,119 @@ +# Architecture overview + +## Purpose + +This document describes the high-level architecture of the intelligent machine-vision system for real-time defect detection on polyurethane shoe soles. + +Full project context is in [`project_context.md`](project_context.md). + +## System diagram summary + +``` +User + │ + ▼ +WEB UI ◄──────► Main Server ◄──────► DB + │ + ┌───────────┼───────────┐ + ▼ ▼ ▼ + YOLO Inst 1 YOLO Inst 2 ... + ▲ ▲ + │ │ + IP Camera 1 IP Camera 2 ... +``` + +## Main components + +### 1. IP Cameras / Raspberry Pi + +- One camera per inspection channel / tab. +- Captures frames in Full HD. +- May be real IP cameras or Raspberry Pi with camera modules. + +### 2. Video stream → frame converter + +- Receives continuous video stream from each camera. +- Extracts single frames for analysis. +- Triggered either by: + - a position sensor / conveyor signal, or + - an AI-based position determination module. + +### 3. Position determination + +- Detects when a sole is in the correct position for capture. +- Can use a physical sensor or an AI model. +- Sends trigger to the frame extractor. + +### 4. Image preparation + +Pre-processing steps applied before inference: + +- normalizing; +- filtering; +- cropping; +- resizing; +- rotating (to simulate and correct small deviations). + +### 5. YOLO instances + +- One YOLO detector instance per camera / channel. +- Receives prepared frame from Image Preparing module. +- Outputs detected defect candidates. + +### 6. AI Model + +- Central model management component. +- YOLO instances load their model weights through this layer. +- Supports retraining from collected expert-verified data. + +### 7. Main Server + +- Orchestrates the pipeline: + - receives frame or prepared image; + - routes to the correct YOLO instance; + - collects inference results; + - stores events in DB; + - serves the WEB UI. +- Reads JSON configuration for channels, model paths, thresholds, etc. + +### 8. WEB UI + +Provides: + +- inspection history view; +- validation status display; +- expert feedback: correct / incorrect result; +- multi-channel tabs (camera 1, 2, 3) with independent YOLO instances and settings; +- live camera preview for setup; +- settings panel; +- retraining trigger with date-range filter. + +### 9. Database + +Stores: + +- inspection events; +- raw and annotated images; +- expert labels; +- configuration. + +### 10. Fake input data / factory environment emulation + +- Generates synthetic / augmented inputs for development and testing. +- Allows offline debugging when real cameras are unavailable. +- Sources: + - initial real dataset; + - artificially generated images (noise, lighting, dirt, rotation). + +## Data flow + +1. Position determination triggers capture. +2. Video stream → frame converter extracts a frame. +3. Image preparing module normalizes / filters / crops / resizes / rotates the frame. +4. Main server routes the prepared image to the right YOLO instance. +5. YOLO instance runs inference using the AI Model. +6. Results return to Main Server. +7. Main Server writes an event to DB. +8. WEB UI reads events and images from DB / server. +9. Expert reviews results in WEB UI and marks correct / incorrect. +10. Verified data flows back to the Learning Dataset for retraining. diff --git a/docs/architecture_testing.md b/docs/architecture_testing.md new file mode 100644 index 0000000..09dc7d1 --- /dev/null +++ b/docs/architecture_testing.md @@ -0,0 +1,67 @@ +# Architecture testing notes + +This file collects testing-related facts implied by the architecture diagram and project context. + +## Timing requirement + +- Analysis + result description must fit within **15 seconds per image**. +- Measure end-to-end time: trigger → inference → DB write → UI update. + +## Test data sets + +### Initial dataset + +- Real photos of polyurethane soles. +- Used for the first training run. + +### Learning dataset + +- Combined from: + - initial dataset; + - artificially generated / augmented images; + - expert-verified production data. + +### Separate test set + +- Must not overlap with training data. +- Size: + - 20 soles without defects; + - 3 soles with defects. + +## Artificial disturbances + +Use the artificial image generator to simulate production conditions: + +- lighting effects; +- PNG pattern overlays (dust, lens dirt, other obstacles); +- rotation (±5° to match allowed positioning tolerance); +- noise generator for dust emulation. + +## Position tolerance + +- Rotation: ±5° relative to camera. +- Displacement: up to 10% of frame. +- Test inference quality across the full tolerance range. + +## Camera replacement study + +- Compare recognition quality when swapping camera models. +- Document changes in preprocessing parameters needed after replacement. + +## Testing environment + +- Run under conditions close to real production. +- Initial development can happen at home / lab using fake input data and artificial disturbances. +- Physical dimensions of soles must stabilize before testing (fixed time after casting). + +## Expert feedback loop + +- Every production event should be reviewable by an operator / expert. +- Expert verdicts (correct / incorrect) feed the learning dataset. +- Retraining can be filtered by date to avoid including low-quality old data. + +## Model versioning + +- Each trained / retrained model must be versioned. +- Keep metrics for each version so performance can be compared. +- Allow rollback to a previous model if retraining degrades quality. diff --git a/docs/project_context.md b/docs/project_context.md new file mode 100644 index 0000000..857b0cc --- /dev/null +++ b/docs/project_context.md @@ -0,0 +1,62 @@ +# Project context + +## Theme + +Development of an intelligent information-measuring system for real-time control of geometric and physico-mechanical parameters of polyurethane shoe soles. + +## Goal + +Determine and classify (into several categories) the presence of defects on a shoe sole despite moderate disturbances such as dust, glare, and varying lighting. + +## Functional requirements + +### User web interface + +- Display inspection history on screen. +- Show status: validated. +- Expert can mark the recognition result as correct or incorrect. +- Omnichannel support: tabs 1, 2, 3 for different cameras / Raspberry Pi instances, each with its own YOLO instance and settings. +- Display camera image for initial setup. +- Settings section. +- Retraining function with date restriction. + +### Machine vision system + +- Hardware: + - 2–3 Raspberry Pi devices or IP cameras with web access. + - Workstation with RTX 2060 GPU. + - Full HD cameras. + - Investigate recognition-quality change when replacing the camera with another model. + +- Software: + - Linux-like operating system. + - Log of processed data. + - Event record per inspected sole: + - sole number; + - defect photo; + - probability score for defect presence; + - separate annotated photo marking the defect zone. + - 15-second budget for image analysis and description output. + +## Operating conditions + +- Describe working conditions: lighting level, type and intensity of disturbances, and estimated probability of influence on the result. +- Determine optimal positioning of the sole relative to the camera. +- Allowable part position: rotation ±5° relative to the camera, displacement up to 10% of the frame. + +## Testing and validation + +- Test and debug the system under conditions close to production (at home/lab environment). +- Noise generator for dust emulation. +- Separate photo set not used during model training: + - 20 pieces without defects; + - 3 pieces with defects. +- Photo effects to simulate different lighting conditions. +- Overlay PNG patterns to simulate lens contamination and other obstacles. +- Photo rotation to simulate positioning deviations. +- Testing is performed after a fixed time from the casting process, when the geometric dimensions have stabilized and no longer change. + +## Retraining + +- Retrain the model using data collected during operation. +- The operator acts as an expert and verifies the model result.