## General

- **Communicate with the user in Russian.** All explanations, reasoning, and feedback should be written in Russian unless explicitly asked otherwise.

## Quick Commands

- **Run server (Fish Speech default):** `python -m voice_tts.main`
- **Dummy backend for fast local tests:** `TTS_BACKEND=dummy python -m voice_tts.main`
- **XTTS-v2 backend:** `TTS_BACKEND=xtts_v2 python -m voice_tts.main`
- **Console script (installed):** `voice-tts`
- **Health check:** `curl http://localhost:8765/health`
- **Browser test client:** `cd examples && python -m http.server 8080` → открыть `http://localhost:8080/client_browser.html`
- **Browser test (dummy):** `TTS_BACKEND=dummy python -m voice_tts.main` + http-сервер из `examples/`

## Project Layout

```
scripts/           — standalone utilities (benchmark, download)
src/voice_tts/     — package entry points
  main.py          — uvicorn.run app (the console-script target)
  config.py        — pydantic-settings (Settings class); .env is auto-loaded here
  api/server.py    — FastAPI + WebSocket session loop; _create_engine() picks backend by TTS_BACKEND env var
  api/protocol.py  — Pydantic msg models for /ws protocol
  session/state.py — SessionState, VoiceProfile
  tts/engine.py    — TTSEngine ABC, DummyTTSEngine
  tts/fish_speech_backend.py — Fish Speech 1.5 implementation
  tts/f5_backend.py        — F5-TTS v1 implementation
  tts/xtts_backend.py      — XTTS-v2 implementation (auto-downloads from Coqui)
  tts/segmenter.py         — sentence-break + comma fallback segmentation
  tts/utils.py              — preprocess_text_for_tts()
  audio/formats.py          — float32→PCM16→base64, WAV header generation
tests/               — pytest files
models/              — local model checkpoints (gitignored)
voices/              — reference audio (wavs/flac); .wav files gitignored but .lab files are kept and used by Fish Speech
```

## Python & Dependencies

Python 3.10–3.12 is required (set in `pyproject.toml`). PyTorch must be installed with CUDA support before other deps:

```bash
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
```

## Configuration

All settings live in `config.py`; the Settings class auto-loads from `.env` via pydantic-settings.

Key variables:

| Variable | Default | Notes |
|---|---|---|
| `TTS_BACKEND` | `fish_speech` | One of: dummy / f5_tts / xtts_v2 / fish_speech. Switching backends requires a clean restart (engine is built lazily on first connection). |
| `TTS_MODEL_PATH` | — | Fish Speech checkpoint folder (contains model.pth, firefly-gan-vq-fsq-8x1024-21hz-generator.pth, tokenizer.tiktoken, config.json) |
| `TTS_VOCAB_PATH` | — | Fish Speech v1.5 source tree path (used to import firefly_gan / FSQ modules) |
| `TTS_MODEL_NAME` | `tts_models/multilingual/multi-dataset/xtts_v2` | Coqui model manager path; xtts_v2 downloads this on first use |
| `FISH_COMPILE` | `false` | Avoid setting to true. Enables torch.compile but causes CUDAGraphs tensor-overwrite errors on repeated inference. |
| `FISH_CHUNK_LENGTH` | 200 | Chunk length for Fish Speech (100–300). Higher = more GPU work per call, higher latency. |

## WebSocket Protocol (`/ws`)

- Server at `ws://localhost:8765/ws`
- Messages are JSON, client-sent types: `init`, `text`, `flush`, `stop`, `emotion`, `config`
- Server sends back: `status` (session_ready / segment_started / stopped / config_updated), `audio` (sample_rate + base64 data), plus error messages on failure.

## Testing

```bash
pytest tests/        # asyncio_mode = auto, paths in tests/
```

Fixtures and reference audio live in `tests/`. No external services required — dummy backend works for unit-level tests without GPU. Fish Speech backends need the local checkpoint in `models/fishaudio_fish-speech-1.5/` (gitignored).

## Scripts

- `scripts/benchmark_backends.py` — compare inference times across backends
- `scripts/download_f5_tts.py` — downloads F5-TTS v1 model files into `models/F5TTS_v1_Base/`
- `scripts/benchmark_compile.py` — torch.compile benchmarking utility

## Important Gotchas

- **Engine is built lazily on first `/ws` connection** in `_create_engine()` inside `api/server.py`. Changing `TTS_BACKEND` requires a full server restart, not just a message-level config change.
- **All GPU calls are serialized through one `_synth_lock`.** Concurrent sessions share a single inference thread — this exists to avoid CUDA contention and OOM on multi-gpu setups.
- `.env` is gitignored but `.env.example` is the source of truth for supported variables. `config.py` line 50 sets `env_file = ".env"`.
- The dummy backend runs via a transient event loop (see `_sync_synthesize` in server.py:291), which means if your test modifies global asyncio state it can break other tests — run tests independently or set `asyncio_mode=auto`.
- **Space insertion between text payloads.** In `_handle_text` (server.py:152–157), a space is automatically inserted between consecutive payloads if neither side has whitespace at the join point. This prevents word merging when clients send word-by-word without trailing spaces (e.g. the browser client). Clients should not include leading/trailing spaces in payloads — the server handles spacing.