--- library_name: transformers license: mpl-2.0 tags: - text-to-speech - tts - xtts-v2 - voice-cloning - multilingual - coqui language: - en - th - es - fr - de - it - pt - pl - tr - ru - nl - cs - ar - zh --- # XTTS-v2 Model Mirror for Quantum Sync This is a mirror/backup of the **Coqui XTTS-v2** model for use with the [Quantum Sync](https://github.com/Useforclaude/quantum-sync-v5) project. ## 🎯 Purpose This mirror serves as: - **Backup** in case the original model becomes unavailable - **Faster access** for Quantum Sync users - **Stable reference** for production deployments ## 📋 Model Information **Original Model:** [coqui/XTTS-v2](https://huggingface.co/coqui/XTTS-v2) **Architecture:** XTTS-v2 (Zero-shot multi-lingual TTS) **Model Size:** ~1.87 GB **Supported Languages:** 13 languages - English (en) - Thai (th) - Spanish (es) - French (fr) - German (de) - Italian (it) - Portuguese (pt) - Polish (pl) - Turkish (tr) - Russian (ru) - Dutch (nl) - Czech (cs) - Arabic (ar) - Chinese (zh-cn) ## 🚀 Usage ### With Quantum Sync (Recommended) ```bash git clone https://github.com/Useforclaude/quantum-sync-v5.git cd quantum-sync-v5/quantum-sync-v11-production # Configure to use this mirror # Edit tts_engines/xtts.py, change model_name to: # model_name = "useclaude/quantum-sync-xtts-v2" python main_v11.py input/file.srt \ --voice MyVoice \ --voice-sample /path/to/voice.wav \ --tts-engine xtts-v2 \ --tts-language en ``` ### Direct Usage with TTS Library ```python from TTS.api import TTS # Use this mirror tts = TTS(model_name="useclaude/quantum-sync-xtts-v2") # Generate speech tts.tts_to_file( text="Hello, this is a test.", speaker_wav="reference_voice.wav", language="en", file_path="output.wav" ) ``` ### Voice Cloning Example ```python from TTS.api import TTS # Initialize tts = TTS(model_name="useclaude/quantum-sync-xtts-v2") # Clone voice from reference audio (6-30 seconds) tts.tts_to_file( text="The quick brown fox jumps over the lazy dog.", speaker_wav="my_voice_sample.wav", # Your voice reference language="en", file_path="output_cloned.wav" ) ``` ## 📊 Performance **From Quantum Sync Production Tests (2025-10-13):** | Metric | Value | |--------|-------| | **Synthesis Speed** | ~3.7 segments/minute | | **Processing Time** | 17 min for 277 segments (23 min audio) | | **Duration Accuracy** | ~87% audio, ~13% silence gaps | | **Timeline Drift** | -1.7% (excellent) | | **Voice Quality** | 8/10 | | **Cloning Accuracy** | Excellent | | **VRAM Usage** | 6-8 GB | **Comparison:** - **XTTS-v2**: 15-17 min, 8/10 quality, FREE, 87% audio - **F5-TTS**: 20-25 min, 7/10 quality, FREE, 55% audio - **AWS Polly**: 5 min, 9/10 quality, ~$0.06, no cloning ## 🎛️ Advanced Parameters ```python # Speed control (0.5 - 2.0) tts.tts_to_file( text="Hello world", speaker_wav="voice.wav", language="en", speed=0.8, # Slower speech file_path="output.wav" ) # Temperature control (0.1 - 1.0) tts.tts_to_file( text="Hello world", speaker_wav="voice.wav", language="en", temperature=0.75, # More expressive file_path="output.wav" ) ``` ## 📦 Model Files ``` quantum-sync-xtts-v2/ ├── model.pth (1.87 GB - Neural network weights) ├── config.json (Model configuration) ├── vocab.json (Vocabulary for tokenization) ├── speakers_xtts.pth (Speaker embeddings) ├── dvae.pth (DVAE component) ├── mel_stats.pth (Mel-spectrogram statistics) ├── LICENSE (MPL 2.0) └── README.md (This file) ``` ## 📜 License **Mozilla Public License 2.0 (MPL 2.0)** This model is licensed under the Mozilla Public License 2.0. You can: - ✅ Use commercially (no restrictions) - ✅ Modify the model - ✅ Distribute the model - ✅ Use in proprietary software **Requirements:** - Include license and copyright notice - State changes if you modify the model - Disclose source for modifications **Full License:** [LICENSE](./LICENSE) ## 🙏 Attribution **Original Work:** - **Project:** [Coqui TTS](https://github.com/coqui-ai/TTS) - **Model:** XTTS-v2 - **Authors:** Coqui TTS Team - **License:** Mozilla Public License 2.0 **This Mirror:** - **Purpose:** Backup for Quantum Sync project - **Maintained by:** [Your Name/Organization] - **Original Source:** https://huggingface.co/coqui/XTTS-v2 All credit goes to the original Coqui TTS team. This is simply a mirror for backup and convenience. ## 📚 Documentation **Quantum Sync Documentation:** - [XTTS-v2 Quick Start Guide](https://github.com/Useforclaude/quantum-sync-v5/blob/tts-experiments/quantum-sync-v11-production/XTTS-QUICK-START.md) - [Paperspace Testing Guide](https://github.com/Useforclaude/quantum-sync-v5/blob/tts-experiments/quantum-sync-v11-production/PAPERSPACE-TTS-TESTING.md) **Original Documentation:** - [Coqui TTS GitHub](https://github.com/coqui-ai/TTS) - [XTTS-v2 Paper](https://arxiv.org/abs/2406.04904) (if available) ## 🔗 Links - **This Mirror:** https://huggingface.co/useclaude/quantum-sync-xtts-v2 - **Original Model:** https://huggingface.co/coqui/XTTS-v2 - **Quantum Sync Project:** https://github.com/Useforclaude/quantum-sync-v5 - **TTS Library:** https://github.com/coqui-ai/TTS ## ⚠️ Disclaimer This is an unofficial mirror maintained for backup purposes. For the latest version and official support, please refer to the [original model](https://huggingface.co/coqui/XTTS-v2) and [Coqui TTS repository](https://github.com/coqui-ai/TTS). ## 📊 Model Card ### Model Description XTTS-v2 is a state-of-the-art zero-shot multi-lingual text-to-speech model that can clone voices from short audio samples (6-30 seconds). **Key Features:** - Zero-shot voice cloning - Multi-lingual support (13 languages) - High-quality natural speech - No fine-tuning required - Commercial use allowed ### Intended Use **Primary Use Cases:** - Voice cloning for content creation - Multi-lingual speech synthesis - Accessibility applications - Audiobook narration - Video dubbing **Out-of-Scope Use:** - Impersonation without consent - Generating misleading content - Illegal activities ### Training Data XTTS-v2 was trained on diverse multi-lingual speech data. For details, see the [original model card](https://huggingface.co/coqui/XTTS-v2). ### Performance See **Performance** section above for detailed benchmarks from Quantum Sync project. ### Ethical Considerations **Voice Cloning Ethics:** - Always obtain consent before cloning someone's voice - Clearly label AI-generated content - Do not use for impersonation or fraud - Follow local regulations on synthetic media ### Limitations - May not perfectly preserve all voice characteristics - Quality varies with reference audio quality - Requires GPU for reasonable speed - ~6-8 GB VRAM recommended - Some languages may have better quality than others ## 🛠️ Technical Specifications **Model Type:** Autoregressive Transformer-based TTS **Framework:** PyTorch **Input:** Text + Reference Audio (6-30 sec WAV) **Output:** 24kHz WAV audio **Inference Time:** ~3-5 seconds per segment (GPU) **Hardware Requirements:** - GPU: NVIDIA with CUDA support - VRAM: 6-8 GB recommended - RAM: 16 GB - Disk: ~2 GB for model **Software Requirements:** - Python 3.9+ - PyTorch 2.0+ - TTS library - CUDA 11.8+ (for GPU) ## 📞 Support **For this mirror:** - Issues: [Quantum Sync GitHub Issues](https://github.com/Useforclaude/quantum-sync-v5/issues) **For original model:** - Issues: [Coqui TTS GitHub Issues](https://github.com/coqui-ai/TTS/issues) --- **Last Updated:** 2025-10-13 **Mirror Version:** 1.0 **Model Version:** XTTS-v2 (Latest as of upload date)