Hindi TTS Fine-tuned XTTS v2 Model 🇮🇳
This is a fine-tuned XTTS v2 model specifically optimized for Hindi text-to-speech with excellent pronunciation quality.
Model Description
- Base Model: Coqui XTTS v2
- Language: Hindi (hi)
- Fine-tuned on: IndicTTS Phase 3 Hindi Dataset
- Training Steps: 3,430+
- Sample Rate: 22,050 Hz
- Voice: Female Hindi speaker
- Model Type: Zero-shot voice cloning with fine-tuned Hindi pronunciation
Features
✅ Excellent Hindi pronunciation - Fine-tuned specifically for Hindi language ✅ Voice cloning - Clone any voice with just 3-10 seconds of audio ✅ Natural sounding - High-quality speech synthesis ✅ Fast inference - Real-time capable with GPU ✅ Production ready - Tested and optimized
Usage
Quick Start with TTS Library
import torch
from TTS.api import TTS
# Initialize TTS with your fine-tuned model
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the model (replace with your downloaded path)
tts = TTS(model_path="path/to/model", config_path="path/to/config.json").to(device)
# Generate speech
tts.tts_to_file(
text="नमस्ते, आज मौसम कैसा है?",
file_path="output.wav",
speaker_wav="path/to/reference_speaker.wav", # Optional: for voice cloning
language="hi"
)
Using with Coqui TTS
from TTS.tts.models.xtts import Xtts
from TTS.tts.configs.xtts_config import XttsConfig
import torch
# Load config
config = XttsConfig()
config.load_json("config.json")
# Load model
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=".")
# Move to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Generate speech
outputs = model.synthesize(
text="नमस्ते, मैं आपकी हिंदी टीटीएस मॉडल हूं।",
config=config,
speaker_wav="reference_speaker.wav",
language="hi",
temperature=0.65,
repetition_penalty=10.0,
top_p=0.85
)
# Save audio
import torchaudio
wav = torch.from_numpy(outputs["wav"]).unsqueeze(0)
torchaudio.save("output.wav", wav, config.audio.sample_rate)
Command Line (if you have TTS installed)
tts --model_path . \
--text "नमस्ते, कैसे हैं आप?" \
--speaker_wav reference.wav \
--language_idx hi \
--out_path output.wav
Installation
# Install Coqui TTS
pip install TTS torch torchaudio
# Download this model
# Via Hugging Face Hub
from huggingface_hub import snapshot_download
model_path = snapshot_download(repo_id="YOUR_USERNAME/hindi-tts-finetuned")
Training Details
Training Data
- Dataset: IndicTTS Phase 3 Hindi
- Speakers: 2 female Hindi speakers
- Total Samples: 1,329 training samples + validation set
- Audio Quality: 22.05 kHz, mono
Training Configuration
- Base Model: XTTS v2 (multilingual)
- Fine-tuning Method: GPT encoder training
- Optimizer: Adam
- Learning Rate: 1e-6
- Batch Size: 1 (with gradient accumulation)
- Training Steps: 3,430+
- GPU: CUDA enabled
- Precision: FP16 mixed precision
Model Checkpoints
This repository contains:
model.pth- Final trained modelbest_model.pth- Best performing checkpointconfig.json- Model configurationvocab.json- Vocabulary (if applicable)
Performance
- Quality: ⭐⭐⭐⭐⭐ Excellent for pure Hindi text
- Speed: Real-time capable with GPU
- Pronunciation: Highly accurate for Hindi phonetics
- Naturalness: Very natural sounding speech
Limitations
- Optimized specifically for Hindi (Devanagari script)
- Performance may vary with Hinglish or English text
- Requires reference speaker audio for voice cloning
- Best results with GPU (CPU inference is slower)
Examples
Text Examples (Best Results)
# Greetings
"नमस्ते, कैसे हैं आप?"
"शुभ प्रभात, आज का दिन शुभ हो।"
# Conversational
"आज मौसम बहुत अच्छा है।"
"क्या मैं आपकी मदद कर सकता हूं?"
# Formal
"मैं आपकी सहायता के लिए यहां हूं।"
"कृपया अपना प्रश्न पूछें।"
Use Cases
- 🎙️ Voice assistants in Hindi
- 📚 Audiobook generation
- 🎓 E-learning content
- 🤖 AI chatbots with voice
- ♿ Accessibility tools
- 📱 Mobile applications
- 🎬 Content creation
Citation
If you use this model in your research or application, please cite:
@misc{hindi-xtts-finetuned-2025,
title={Hindi Fine-tuned XTTS v2 Model},
author={Your Name},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/YOUR_USERNAME/hindi-tts-finetuned}}
}
License
This model is licensed under the Coqui Public Model License (CPML).
Key Points:
- ✅ Free for personal and research use
- ✅ Free for commercial use with attribution
- ❌ Cannot be used to build competing TTS services
- See full license for details
Acknowledgments
- Base Model: Coqui XTTS v2
- Dataset: IndicTTS
- Training: Fine-tuned using Coqui TTS framework
Model Card Authors
Created by: [Your Name] Date: October 2025
Additional Information
Model Architecture
- Type: XTTS v2 (Transformer-based TTS)
- Components:
- GPT-based text encoder (fine-tuned)
- DVAE vocoder
- Latent diffusion model
Intended Use
This model is intended for:
- Text-to-speech applications in Hindi
- Voice cloning with Hindi content
- Research in Hindi speech synthesis
- Educational and accessibility tools
Out-of-Scope Use
Not recommended for:
- Languages other than Hindi (use original XTTS v2)
- Real-time lip-sync (not optimized for this)
- Voice impersonation for malicious purposes
Contact
For questions, issues, or collaborations:
- Hugging Face: @YOUR_USERNAME
- Issues: Report here
Updates
- v1.0 (October 2025): Initial release with 3,430 training steps
Enjoy natural Hindi speech synthesis! 🎉🇮🇳
- Downloads last month
- 25
Model tree for viraj8899/hindi-tts-finetuned
Base model
coqui/XTTS-v2