Hindi TTS Fine-tuned XTTS v2 Model 🇮🇳

This is a fine-tuned XTTS v2 model specifically optimized for Hindi text-to-speech with excellent pronunciation quality.

Model Description

  • Base Model: Coqui XTTS v2
  • Language: Hindi (hi)
  • Fine-tuned on: IndicTTS Phase 3 Hindi Dataset
  • Training Steps: 3,430+
  • Sample Rate: 22,050 Hz
  • Voice: Female Hindi speaker
  • Model Type: Zero-shot voice cloning with fine-tuned Hindi pronunciation

Features

Excellent Hindi pronunciation - Fine-tuned specifically for Hindi language ✅ Voice cloning - Clone any voice with just 3-10 seconds of audio ✅ Natural sounding - High-quality speech synthesis ✅ Fast inference - Real-time capable with GPU ✅ Production ready - Tested and optimized

Usage

Quick Start with TTS Library

import torch
from TTS.api import TTS

# Initialize TTS with your fine-tuned model
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model (replace with your downloaded path)
tts = TTS(model_path="path/to/model", config_path="path/to/config.json").to(device)

# Generate speech
tts.tts_to_file(
    text="नमस्ते, आज मौसम कैसा है?",
    file_path="output.wav",
    speaker_wav="path/to/reference_speaker.wav",  # Optional: for voice cloning
    language="hi"
)

Using with Coqui TTS

from TTS.tts.models.xtts import Xtts
from TTS.tts.configs.xtts_config import XttsConfig
import torch

# Load config
config = XttsConfig()
config.load_json("config.json")

# Load model
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=".")

# Move to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Generate speech
outputs = model.synthesize(
    text="नमस्ते, मैं आपकी हिंदी टीटीएस मॉडल हूं।",
    config=config,
    speaker_wav="reference_speaker.wav",
    language="hi",
    temperature=0.65,
    repetition_penalty=10.0,
    top_p=0.85
)

# Save audio
import torchaudio
wav = torch.from_numpy(outputs["wav"]).unsqueeze(0)
torchaudio.save("output.wav", wav, config.audio.sample_rate)

Command Line (if you have TTS installed)

tts --model_path . \
    --text "नमस्ते, कैसे हैं आप?" \
    --speaker_wav reference.wav \
    --language_idx hi \
    --out_path output.wav

Installation

# Install Coqui TTS
pip install TTS torch torchaudio

# Download this model
# Via Hugging Face Hub
from huggingface_hub import snapshot_download
model_path = snapshot_download(repo_id="YOUR_USERNAME/hindi-tts-finetuned")

Training Details

Training Data

  • Dataset: IndicTTS Phase 3 Hindi
  • Speakers: 2 female Hindi speakers
  • Total Samples: 1,329 training samples + validation set
  • Audio Quality: 22.05 kHz, mono

Training Configuration

  • Base Model: XTTS v2 (multilingual)
  • Fine-tuning Method: GPT encoder training
  • Optimizer: Adam
  • Learning Rate: 1e-6
  • Batch Size: 1 (with gradient accumulation)
  • Training Steps: 3,430+
  • GPU: CUDA enabled
  • Precision: FP16 mixed precision

Model Checkpoints

This repository contains:

  • model.pth - Final trained model
  • best_model.pth - Best performing checkpoint
  • config.json - Model configuration
  • vocab.json - Vocabulary (if applicable)

Performance

  • Quality: ⭐⭐⭐⭐⭐ Excellent for pure Hindi text
  • Speed: Real-time capable with GPU
  • Pronunciation: Highly accurate for Hindi phonetics
  • Naturalness: Very natural sounding speech

Limitations

  • Optimized specifically for Hindi (Devanagari script)
  • Performance may vary with Hinglish or English text
  • Requires reference speaker audio for voice cloning
  • Best results with GPU (CPU inference is slower)

Examples

Text Examples (Best Results)

# Greetings
"नमस्ते, कैसे हैं आप?"
"शुभ प्रभात, आज का दिन शुभ हो।"

# Conversational
"आज मौसम बहुत अच्छा है।"
"क्या मैं आपकी मदद कर सकता हूं?"

# Formal
"मैं आपकी सहायता के लिए यहां हूं।"
"कृपया अपना प्रश्न पूछें।"

Use Cases

  • 🎙️ Voice assistants in Hindi
  • 📚 Audiobook generation
  • 🎓 E-learning content
  • 🤖 AI chatbots with voice
  • ♿ Accessibility tools
  • 📱 Mobile applications
  • 🎬 Content creation

Citation

If you use this model in your research or application, please cite:

@misc{hindi-xtts-finetuned-2025,
  title={Hindi Fine-tuned XTTS v2 Model},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/YOUR_USERNAME/hindi-tts-finetuned}}
}

License

This model is licensed under the Coqui Public Model License (CPML).

Key Points:

  • ✅ Free for personal and research use
  • ✅ Free for commercial use with attribution
  • ❌ Cannot be used to build competing TTS services
  • See full license for details

Acknowledgments

Model Card Authors

Created by: [Your Name] Date: October 2025

Additional Information

Model Architecture

  • Type: XTTS v2 (Transformer-based TTS)
  • Components:
    • GPT-based text encoder (fine-tuned)
    • DVAE vocoder
    • Latent diffusion model

Intended Use

This model is intended for:

  • Text-to-speech applications in Hindi
  • Voice cloning with Hindi content
  • Research in Hindi speech synthesis
  • Educational and accessibility tools

Out-of-Scope Use

Not recommended for:

  • Languages other than Hindi (use original XTTS v2)
  • Real-time lip-sync (not optimized for this)
  • Voice impersonation for malicious purposes

Contact

For questions, issues, or collaborations:

Updates

  • v1.0 (October 2025): Initial release with 3,430 training steps

Enjoy natural Hindi speech synthesis! 🎉🇮🇳

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for viraj8899/hindi-tts-finetuned

Base model

coqui/XTTS-v2
Finetuned
(53)
this model