Hindi TTS Fine-tuned XTTS v2 Model 🇮🇳

This is a fine-tuned XTTS v2 model specifically optimized for Hindi text-to-speech with excellent pronunciation quality.

Model Description

Base Model: Coqui XTTS v2
Language: Hindi (hi)
Fine-tuned on: IndicTTS Phase 3 Hindi Dataset
Training Steps: 3,430+
Sample Rate: 22,050 Hz
Voice: Female Hindi speaker
Model Type: Zero-shot voice cloning with fine-tuned Hindi pronunciation

Features

✅ Excellent Hindi pronunciation - Fine-tuned specifically for Hindi language ✅ Voice cloning - Clone any voice with just 3-10 seconds of audio ✅ Natural sounding - High-quality speech synthesis ✅ Fast inference - Real-time capable with GPU ✅ Production ready - Tested and optimized

Usage

Quick Start with TTS Library

import torch
from TTS.api import TTS

# Initialize TTS with your fine-tuned model
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model (replace with your downloaded path)
tts = TTS(model_path="path/to/model", config_path="path/to/config.json").to(device)

# Generate speech
tts.tts_to_file(
    text="नमस्ते, आज मौसम कैसा है?",
    file_path="output.wav",
    speaker_wav="path/to/reference_speaker.wav",  # Optional: for voice cloning
    language="hi"
)

Using with Coqui TTS

from TTS.tts.models.xtts import Xtts
from TTS.tts.configs.xtts_config import XttsConfig
import torch

# Load config
config = XttsConfig()
config.load_json("config.json")

# Load model
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=".")

# Move to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Generate speech
outputs = model.synthesize(
    text="नमस्ते, मैं आपकी हिंदी टीटीएस मॉडल हूं।",
    config=config,
    speaker_wav="reference_speaker.wav",
    language="hi",
    temperature=0.65,
    repetition_penalty=10.0,
    top_p=0.85
)

# Save audio
import torchaudio
wav = torch.from_numpy(outputs["wav"]).unsqueeze(0)
torchaudio.save("output.wav", wav, config.audio.sample_rate)

Command Line (if you have TTS installed)

tts --model_path . \
    --text "नमस्ते, कैसे हैं आप?" \
    --speaker_wav reference.wav \
    --language_idx hi \
    --out_path output.wav

Installation

# Install Coqui TTS
pip install TTS torch torchaudio

# Download this model
# Via Hugging Face Hub
from huggingface_hub import snapshot_download
model_path = snapshot_download(repo_id="YOUR_USERNAME/hindi-tts-finetuned")

Training Details

Training Data

Dataset: IndicTTS Phase 3 Hindi
Speakers: 2 female Hindi speakers
Total Samples: 1,329 training samples + validation set
Audio Quality: 22.05 kHz, mono

Training Configuration

Base Model: XTTS v2 (multilingual)
Fine-tuning Method: GPT encoder training
Optimizer: Adam
Learning Rate: 1e-6
Batch Size: 1 (with gradient accumulation)
Training Steps: 3,430+
GPU: CUDA enabled
Precision: FP16 mixed precision

Model Checkpoints

This repository contains:

model.pth - Final trained model
best_model.pth - Best performing checkpoint
config.json - Model configuration
vocab.json - Vocabulary (if applicable)

Performance

Quality: ⭐⭐⭐⭐⭐ Excellent for pure Hindi text
Speed: Real-time capable with GPU
Pronunciation: Highly accurate for Hindi phonetics
Naturalness: Very natural sounding speech

Limitations

Optimized specifically for Hindi (Devanagari script)
Performance may vary with Hinglish or English text
Requires reference speaker audio for voice cloning
Best results with GPU (CPU inference is slower)

Examples

Text Examples (Best Results)

# Greetings
"नमस्ते, कैसे हैं आप?"
"शुभ प्रभात, आज का दिन शुभ हो।"

# Conversational
"आज मौसम बहुत अच्छा है।"
"क्या मैं आपकी मदद कर सकता हूं?"

# Formal
"मैं आपकी सहायता के लिए यहां हूं।"
"कृपया अपना प्रश्न पूछें।"

Use Cases

🎙️ Voice assistants in Hindi
📚 Audiobook generation
🎓 E-learning content
🤖 AI chatbots with voice
♿ Accessibility tools
📱 Mobile applications
🎬 Content creation

Citation

If you use this model in your research or application, please cite:

@misc{hindi-xtts-finetuned-2025,
  title={Hindi Fine-tuned XTTS v2 Model},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/YOUR_USERNAME/hindi-tts-finetuned}}
}

License

This model is licensed under the Coqui Public Model License (CPML).

Key Points:

✅ Free for personal and research use
✅ Free for commercial use with attribution
❌ Cannot be used to build competing TTS services
See full license for details

Acknowledgments

Base Model: Coqui XTTS v2
Dataset: IndicTTS
Training: Fine-tuned using Coqui TTS framework

Model Card Authors

Created by: [Your Name] Date: October 2025

Additional Information

Model Architecture

Type: XTTS v2 (Transformer-based TTS)
Components:
- GPT-based text encoder (fine-tuned)
- DVAE vocoder
- Latent diffusion model

Intended Use

This model is intended for:

Text-to-speech applications in Hindi
Voice cloning with Hindi content
Research in Hindi speech synthesis
Educational and accessibility tools

Out-of-Scope Use

Not recommended for:

Languages other than Hindi (use original XTTS v2)
Real-time lip-sync (not optimized for this)
Voice impersonation for malicious purposes

Contact

For questions, issues, or collaborations:

Hugging Face: @YOUR_USERNAME
Issues: Report here

Updates

v1.0 (October 2025): Initial release with 3,430 training steps

Enjoy natural Hindi speech synthesis! 🎉🇮🇳

Downloads last month: 25

Model tree for viraj8899/hindi-tts-finetuned

Base model

coqui/XTTS-v2

Finetuned

(53)

this model