# Voice TTS

Local GPU-powered real-time text-to-speech pipeline with a WebSocket API, designed to voice AI agents that stream text in chunks.

## Features

- **Streaming input**: accepts partial text as it is generated by the LLM/agent.
- **Streaming output**: returns PCM audio chunks over WebSocket as soon as they are synthesized.
- **Voice cloning**: single speaker cloned from reference audio, with optional per-emotion references.
- **Interrupt / stop**: agent can immediately stop playback when the user interrupts the AI.
- **Emotion control**: switch emotion on the fly (requires matching reference audio or supported backend).
- **Local GPU**: runs entirely on your NVIDIA GPU (RTX 3090 / 3060 compatible).

## Project status

- Working WebSocket server with streaming text, audio streaming, and instant stop/resume.
- F5-TTS backend installed, GPU-ready, and producing real audio (`models/F5TTS_v1_Base/` downloaded).
- Dummy backend available for fast offline tests.
- Startup warm-up caches the default reference and primes CUDA.
- Next: multilingual evaluation, latency optimization, and client examples.

## Quick start

```bash
# Create virtual environment (Python 3.10-3.12 recommended)
python3.11 -m venv .venv
source .venv/bin/activate

# Install PyTorch with CUDA 12.6 support first
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu126

# Install remaining dependencies
pip install -r requirements.txt

# (Optional) Download the F5-TTS model beforehand
python scripts/download_f5_tts.py --model F5TTS_v1_Base

# Run the server
python -m voice_tts.main

# Or run in dummy test mode
TTS_BACKEND=dummy python -m voice_tts.main
```

Server will listen on `ws://localhost:8765/ws`.

## WebSocket protocol

See full documentation in [`docs/03_websocket_protocol.md`](docs/03_websocket_protocol.md).

## Architecture and roadmap

- [`docs/01_overview.md`](docs/01_overview.md)
- [`docs/02_architecture.md`](docs/02_architecture.md)
- [`docs/04_roadmap.md`](docs/04_roadmap.md)
- [`docs/05_usage.md`](docs/05_usage.md)
- [`docs/06_technical_notes.md`](docs/06_technical_notes.md)
